Optimizing auditory input for foreign language learners through a verbotonal-based dichotic listening approach

The quality of the physical language signals to which learners are exposed and which result in neurobiological activity leading to perception constitutes a variable that is rarely, if ever, considered in the context of language learning. It deserves some attention. The current study identifies an optimal audio language input signal for Chinese EFL/ESL learners generated by modifying the physical features of language-bearing audio signals. This is achieved by applying the principles of verbotonalism in a dichotic listening context. Low-pass filtered (320 Hz cut-off) and unfiltered speech signals were dichotically and diotically directed to each hemisphere of the brain through the contralateral ear. Temporal and spatial neural signatures for the processing of the signals were detected in a combined event-related potential (ERP) and functional magnetic resonance imaging (fMRI) experiment. Results showed that the filtered stimuli in the left ear and unfiltered in the right ear (FL-R) configuration provided optimal auditory language input by actively exploiting left-hemispheric dominance for language processing and right-hemispheric dominance for melodic processing, i.e., each hemisphere was fed the signals that it should be best equipped to process—and it actually did so effectively. In addition, the filtered stimuli in the right ear and unfiltered in the left ear (L-FR) configuration was identified as entirely non-optimal for language learners. Other outcomes included significant load reduction through exposure to both-ear-filtered FL-FR signals as well as the confirmation that non-language signals were recognized by the brain as irrelevant to language and did not trigger any language processing. These various outcomes will necessarily entail further research.

foreign language teachers to put much emphasis on higher-order cognitive skills, such as applying rules, communicating in oral and written forms, translating to or from the target language, or even reading and thinking critically in a foreign language (Galotti, 2017;Levine, 2009). These seem to place heavy demands on foreign language learners with intermediate or lower levels of language proficiency. However, the importance of raising students' awareness of foreign language signals has tended to be neglected. These foreign language signals trigger biological activity in the brain, the primary perceiver of language signals during the learning process. As a consequence, the perception of physical language signals (i.e., physical language input) resulting from neurobiological activity necessarily plays as important a role as higher-level cognitive skills, especially for foreign language learners. Without such perception (and accompanying neurobiological activity), no learning could possibly occur. The notion of language input that we are describing here bears little resemblance to traditional ways of discussing input in language learning (e.g., Krashen's, 1982Krashen's, , 1985Krashen's, , 2003 which is exclusively linguistic. However, there is one point of concurrence. Krashen seeks to identify the best form of linguistic input for language learners (using the i + 1 metaphor). We will seek the best form of physical auditory input using instrumental studies to guide our decision.
Identifying the optimal language input that is best suited for neural processing will help to enable effective learning by acting maximally on the brain's neuroplasticity. The determination of optimal language input will take as its starting point the Verbotonal theory of perception (Guberina & Asp, 1981. The Verbotonal theory was designed to provide language learners and hearingimpaired subjects with optimal language input and considers language/speech development as a meaning-making process (Lian, 1980(Lian, , 2004(Lian, , 2011(Lian, , 2014. In addition, the way a speaker produces speech reflects how he/she perceives speech (Guberina & Asp, 1981. Thus, changing learners' perceptions of speech would give rise to changes in their production of speech. In other words, the correction of the learners' speech would be based upon the correction of their perception of speech (a neurobiological activity). The Verbotonal approach emphasizes that prosodic information (intonation and rhythm), contained in low frequencies, conveys meanings and changes learners' perception and production of speech (Guberina & Asp, 1981Kim & Asp, 2002). The reason is that the cochlea and the vestibular organ develop from the feeling of speech prosody in low frequencies during fetal life, which is the basis of the proprioceptive memory and auditory-memory development (Asp, 2006). Thus, our ears are sensitive to the changes of pitch, rhythm, and intonation via low frequencies. Low-frequency signals are derived from a low-pass filter at a cutoff frequency of about 320 Hz, in which the fundamental frequency (F 0 ) and prosodic features (stress, rhythm, loudness, and intonation) are maintained but the frequencies (above 320 Hz) that make words identifiable are removed (Lian & Sussex, 2018). From the clinical practice of speech-language pathology and the practice of foreign language learning, the Verbotonal approach with low-pass filtered signals is identified as effective for rehabilitating hearing-impaired children (e.g., Asp et al., 2003;Jurjević-Grkinić et al., 2015) and for teaching ESL learners. In recent experiments in China, ESL learners' pronunciation was improved significantly after the Verbotonal approach was implemented, including English speaking skills in general (He, 2014;He et al., 2015;Yang, 2016;Yang et al., 2017), pronunciation correction (Wen et al., 2020), as well as phonological working memory (Yang, 2016;Yang et al., 2017). Previous studies provide empirical evidence that the Verbotonal approach with low-pass filtered signals offers an optimal model for language perception and production and restructures learners' perceptions of speech. However, research on the neural processing mechanisms of the low-pass filtered signals is still scarce. Thus, the current study aims to unveil a Chinese ESL learner's brain activity in response to low-pass filtered and unfiltered signals in a combined ERP (event-related potential) and fMRI (functional magnetic resonance imaging) investigation.
As mentioned earlier, an optimal language input signal ought to be the signal that is best suited for the brain to process. This principle will form the basis for the research reported here. Specifically, our assumption is based on ear advantage for linguistic and melodic signal processing. In this context, the underlying neural mechanisms suggest that the left hemisphere that connects primarily to the right ear is dominant in the processing of language-bearing signals (e.g., Tervaniemi & Hugdahl, 2003;Vigneau et al., 2006), and the right hemisphere that links to the left ear is predominant in the melodic and speech intonation processing (e.g., Meyer et al., 2002;Sammler et al., 2015). As a consequence, and in order to offer the brain a presumably optimal signal for processing, a dichotic listening approach is adopted. This selectively directs low-pass filtered language signals with only prosodic features of speech through the left ear to the right brain, and an unfiltered language-bearing signal through the right ear to the left brain. It is assumed that, in this configuration, optimal language signals are sent to each hemisphere of the brain. Detailed descriptions of the sentences used as auditory stimuli and the configurations of the stimuli will be illustrated in "Auditory language stimuli".
ERP and fMRI technologies rely on noninvasive techniques widely used to track brain responses to language signals. ERP measures electrophysiological responses to cognitive-related events, presented by brain wave activities and the components elicited by the experimental tasks or events (Antonenko et al., 2014;Luck, 2014). A negative wave peaking at 400 ms is known as N400, which is usually elicited by semantic violations in a sentence; and a positive wave peaking at 600 ms is identified as P600 and is elicited by a syntactic violation (Daltrozzo & Conway, 2014;Kutas & Federmeier, 2011;Morgan-Short & Tanner, 2014;Swaab et al., 2012). These two ERP components are related to semantic and syntactic components of language and will be used extensively in this study. These elicited components can be analyzed by their amplitude, latency, and distribution to help understand the cognitive processes related to the experimental tasks or events (Antonenko et al., 2014). fMRI detects changes in blood oxygenation in response to neural activation with high spatial-resolution images (Kuhl & Rivera-Gaxiola, 2008;Logothetis, 2012). The brain regions activated by a specific task or activity are determined by the blood oxygenation level-dependent (BOLD) signals (Indefrey, 2012;Poldrack, 2018). It is expected that functional localization for processing the low-pass filtered and unfiltered language signals will be revealed in the fMRI component of the experiment. The combined ERP and fMRI approach employed in this case study aims to provide temporal and spatial neural signatures for the processing of the low-pass filtered and unfiltered English sentences under dichotic listening conditions in a Chinese ESL student. As a result, it ought to be possible to identify the characteristics of an optimal language auditory input signal for Chinese ESL learners.

Participant
A male graduate student aged 27 years took part in the current study. He was strongly right-handed with a laterality index = 100 (Oldfield, 1971), who denied a history of neurological or psychiatric disease and had normal hearing and vision. Mandarin Chinese was his first language, and he had been learning English as a foreign language for 15 years. For English language proficiency, he had passed the College English Test-Band 4 (CET is the standardized English proficiency test for college students in China; Band 4 is the medium level of the test) and self-evaluated his English proficiency as intermediate level. The participant gave written informed consent before participating in the study, as approved by the local Ethics Committee.

Auditory language stimuli
The listening materials were complete English sentences extracted from the Cambridge Preliminary English Test (PET) with B1/intermediate difficulty level. Since the participant's actual English listening proficiency was basically consistent with the difficulty level of PET, listening materials included in Objective PET Teacher's Book (4th Edition) (Hashemi & Thomas, 2013) would impose neither too heavy nor too light a load on the participant. The sentences were pronounced by two native speakers at a rate of 200 words per minute approximately. Sentences used as auditory stimuli are listed in Table 1. All auditory stimuli were designed to be 3000 ms in the experiment, during which sentence signals lasted 2638.33 ± 147.30 ms, followed by a silence of 361.67 ± 147.30 ms.
The auditory stimuli consisted of filtered and unfiltered sounds in both ears, organized into four configurations: (a) filtered stimuli in both channels (FL-FR); (b) filtered stimuli in the left channel and unfiltered in the right channel (FL-R); (c) unfiltered stimuli in the left channel and filtered in the right channel (L-FR); (d) unfiltered stimuli in both channels (NL-NR). All auditory stimuli were edited using Adobe Audition (Version 11.1.0; https:// adobe. com/ produ cts/ audit ion) at a 44.1 kHz sampling rate in a 32-bit stereo audio track. Low-pass filtering was obtained by setting a cut-off frequency of 320 Hz as per standard practice in previous verbotonal experiments. Frequencies above 320 Hz were removed from the sound signals and frequencies below 320 Hz were maintained. Amplitudes of the unfiltered stimuli were normalized to 70%, and amplitudes of the lowpass filtered signals were normalized to 85% to ensure equal intensity as filtering led to Page 5 of 20 Cai et al. Asian. J. Second. Foreign. Lang. Educ. (2021) 6:14 energy loss (Meyer et al., 2002). Figure 1 shows a spectrogram of a sentence with the FL-R configuration.

ERP experimental design
The Oddball paradigm with frequent non-target standard stimuli, less frequent target deviant stimuli and rare non-target novel stimuli was adopted in the current study. As the research focus of the current study was the neural processing of low-pass filtered auditory language stimuli under dichotic conditions, the deviant stimuli were designed to be FL-R, L-FR, and FL-FR signals in three separate runs of ERP recordings. Standard stimuli consisted of NL-NR signals and novel stimuli consisted of non-linguistic environmental sounds (running water, train noises, and birds singing). 210 auditory stimuli were presented in each run, including 150 standard NL-NR stimuli (with a frequency of occurrence of 71.4%), 50 deviant stimuli with one of the FL-R, L-FR, and FL-FR configurations (23.8%), and 10 novel stimuli of environmental sounds (4.8%). Stimulus presentation was controlled by E-Prime 3.0 (https:// pstnet. com/ produ cts/e-prime), and all auditory stimuli were presented randomly and continuously to the participant. Each run took 630 s and the whole ERP experiment took around 40 min (Fig. 2).

ERP data acquisition
The participant was fitted with an electrode cap with tin electrodes based on the International 10-20 System (Klem et al., 1999). Data was acquired from the sixteen electrode sites of Fp1, Fp2, F3, Fz, F4, T3, C3, Cz, C4, T4, P3, Pz, P4, O1, Oz, and O2 (as shown in Fig. 3). Auditory stimuli were presented via stereo headphones (Sennheiser HD 435), and the participant was asked to listen to the signals and close his eyes to avoid eyeblinks. During data recording, electrode impedance was maintained below 20 kΩ. A 30-s recording of resting-state EEG data was used as a control (baseline) signal. EEGLAB (Version 15.0.0b; https:// sccn. ucsd. edu/ eeglab) was used to analyze the EEG data. A Fig. 1 Spectrogram of the FL-R stimulus "I preferred baseball when I was at school. " Page 6 of 20 Cai et al. Asian. J. Second. Foreign. Lang. Educ. (2021) 6:14 band-pass filter of 0.1-70.0 Hz was adopted to filter the off-line data, and artifacts above 100 μV were rejected.

ERP data analysis
The ERP components of N400 and P600 are language-related ERP components regarding semantic and syntactic manipulations (Daltrozzo & Conway, 2014;Kutas & Federmeier, 2011;Morgan-Short & Tanner, 2014;Swaab et al., 2012). The amplitudes of N400 and P600 reflected mental workload for sentential processing since larger N400 amplitudes were elicited by semantically less predictable and difficult information, and larger P600 amplitudes were evoked by syntactically difficult or less-preferred sentences (Kutas & Federmeier, 2011;Swaab et al., 2012). The latencies of N400 and P600 components reflected the stimulus evaluation process and the relative timing of the response to the signals (Swaab et al., 2012). Scalp topographies of N400 and P600 components demonstrated distributions of the electrical activity over the scalp that were elicited by the auditory stimuli. In addition, the N400 and P600 effects that presented typical centroparietal scalp distribution were maximal at midline centroparietal sites (e.g., Brouwer & Hoeks, 2013;Brouwer et al., 2012;Swaab et al., 2012;van Herten et al., 2005). Thus, the measures of amplitudes, latencies, and scalp topographies of N400 and P600 components within 1000 ms after stimulus onset at the midline sites (Fz, Cz, and Pz) were taken for main analyses in the current study.

fMRI experiment fMRI experimental design
A block design was adopted in the fMRI experiment, which was innately suited for the detection of the brain regions activated by particular tasks/stimuli compared to other paradigms (Donaldson, 2004;Petersen & Dubis, 2012). Four runs of fMRI scanning examined four configurations of auditory signals (i.e., FL-FR, FL-R, L-FR, and NL-NR) respectively. The block design contained rest and stimulus blocks in each run, in which one rest/stimulus block lasted 12 s. A twelve-block design, i.e., six rest blocks and six stimulus blocks, were used as the paradigm and presented continuously and alternately. This took 144 s in each run of fMRI scanning for each configuration of auditory language signals (as shown in Fig. 4a).
Each stimulus block contained one sentence that was repeated four times as four auditory trials lasting 12 s. Six sentences were included in the six stimulus blocks respectively. 12-s rest blocks were intervals between the stimulus blocks as the baseline. The rest-stimulus block design is illustrated in Fig. 4b.

fMRI data acquisition
All images were acquired by using a General Electric MR750w 3.0 T MRI scanner (GE, USA). The participant wore MRI-compatible pneumatic in-ear headphones in the scanner. His head was positioned in the head coil and secured with foam padding. A compression alarm ball was placed in the participant's dominant right hand. During scanning, the participant was asked to be relaxed and listen to each signal without performing other tasks.

fMRI data analysis
The fMRI data were analyzed by using the Statistical Parametric Mapping software (SPM12; https:// www. fil. ion. ucl. ac. uk/ spm/ softw are/ spm12) (Friston et al., 1995;Penny et al., 2011). Data preprocessing included slice timing correction, realignment for estimation and reslicing, normalization, and smoothing. For statistical analysis, the probability threshold was set at p < 0.001 (uncorrected), and the pattern of BOLD-fMRI activation at a 10-voxel cluster threshold. The functional activation images were superimposed on the three-dimensional anatomical images to generate the functional and anatomical images, which were then normalized into the Montreal Neurological Institute (MNI) Fig. 4 fMRI experimental design. a A twelve-block design for the fMRI experiment. b The rest-stimulus block design for the fMRI experiment Page 9 of 20 Cai et al. Asian. J. Second. Foreign. Lang. Educ. (2021) 6:14 stereotactic space for anatomical localization of the activated brain regions. The xjView toolbox (Version 9.7; https:// www. alive learn. net/ xjview) was adopted to visualize cerebral activations induced by the four configurations of the auditory language stimuli in the current study. Table 2 illustrates the amplitudes and latencies of N400 and P600 components elicited by the FL-R, L-FR, FL-FR, and NL-NR stimuli in three separate runs of recording. As N400 and P600 components were typically distributed in the centroparietal areas and reached the maximum at the midline electrode sites (Brouwer & Hoeks, 2013;Brouwer et al., 2012;Swaab et al., 2012;van Herten et al., 2005), analyses of the amplitudes and latencies of N400 and P600 were carried out at the midline sites of Fz, Cz, and Pz. The environmental sounds (running water, train noises, and birds singing) as novel stimuli in three runs of ERP recording did not elicit any language-related N400 and P600 effects, thus confirming the connection between N400 and P600 with language.

ERP data
In the first run of the experiment regarding the FL-R configuration, the amplitudes of the N400 component induced by the unfiltered NL-NR stimuli were smaller than the FL-R signals elicited at the midline sites. But for the P600 component, the FL-R stimuli elicited smaller amplitudes than the NL-NR stimuli did. The latencies of the N400 and P600 components showed that FL-R elicited a longer response time compared to the latencies induced by the NL-NR signals, except for the N400 latency at the Pz site elicited by FL-R was slightly shorter. As for the L-FR configuration, the L-FR stimuli induced larger amplitudes of both N400 and P600 components in the Page 10 of 20 Cai et al. Asian. J. Second. Foreign. Lang. Educ. (2021) 6:14 centroparietal areas compared with NL-NR. In addition, the L-FR signals elicited shorter latencies of the N400 component but longer P600 latencies at the midline sites relative to the NL-NR stimuli. Compared to the NL-NR signals, FL-FR elicited smaller N400 amplitudes, and smaller P600 amplitudes were also induced by the FL-FR signals except for the Pz site that showed a slightly larger amplitude. Longer latencies of N400 induced by FL-FR occurred at the Fz and Cz sites but shorter latency at the Pz site relative to NL-NR. P600 latencies reflected that FL-FR elicited a longer response time than NL-NR did.
The absolute voltage values of ERPs elicited by the FL-R, L-FR, FL-FR, and NL-NR stimuli at the midline sites (Fz, Cz, and Pz sites) are plotted in Fig. 5. ERP waves are plotted in the FL-R, L-FR, and FL-FR configurations respectively corresponding to three separate runs of ERP recording. Green waves refer to the deviant stimuli (i.e., the FL-R, L-FR, and FL-FR stimuli in three runs of the experiment respectively) and blue waves represent the standard NL-NR stimuli in three runs. Since the difference waves between deviant and standard stimuli provide more information on the impact of linguistic manipulations on brain responses rather than the absolute voltage values (Morgan-Short & Tanner, 2014), the difference waves between FL-R/L-FR/FL-FR and NL-NR are displayed by red waves to identify the N400 and P600 effects. Page 11 of 20 Cai et al. Asian. J. Second. Foreign. Lang. Educ. (2021) 6:14 N400 and P600 effects were elicited due to linguistic (semantic and syntactic) anomalies relative to well-formed linguistic signals (Kutas & Federmeier, 2011;Swaab et al., 2012). From the waveforms in the FL-R configuration, FL-R elicited larger N400 amplitudes compared to NL-NR, which indicates a higher processing load of semantic manipulation elicited by the FL-R signals. The N400 effect was elicited by FL-R at around 500 ms after the stimulus onset in the Pz site. But for P600, FL-R did not elicit any obvious P600 effect since smaller amplitudes were elicited at the midline sites relative to NL-NR. It indicated that a lower load of syntactic manipulation was elicited by FL-R than NL-NR. The FL-R configuration shows that the brain actively manipulates semantic processing which occupies a higher processing load compared to NL-NR, meanwhile, the FL-R signals are syntactically easier to process.
Regarding the L-FR configuration, both N400 and P600 effects were elicited by the L-FR signals at the midline sites. Different from the other configurations of FL-R and FL-FR, difference waves between L-FR and FL-FR fluctuated considerably at three midline sites. It indicates that the L-FR signals are unusual and unexpected for the brain to process, as a result, the brain struggles with the L-FR stimuli.
The waveforms in the FL-FR configuration showed that smaller N400 amplitudes were elicited by FL-FR compared to NL-NR, and no N400 effect was found in the midline sites. But the P600 effect induced by the FL-FR signals was observed at the Pz site. It indicated that the FL-FR stimuli were semantically easier for the brain to process than NL-NR since the FL-FR signals were prosodic signals without identifiable lexical information, however, the FL-FR stimuli with only intonation and rhythm information were unusual for the participant to manipulate syntactic processing.
Topographic maps of the ERP components N400 and P600 that were elicited by the standard and deviant stimuli in three runs of the experiment were plotted in Fig. 6. Since the novel stimuli of environmental sounds did not elicit the N400 and P600 components during three runs of recording, the scalp topographies of ERP components elicited by the environmental sounds were not plotted in Fig. 6. In the topographic maps, the N400 component with negative voltage values was presented on the left, and the positive P600 component was shown on the right.
As standard stimuli in the experiment, the unfiltered NL-NR stimuli maintained all frequencies of the auditory signals so that each word in the sentences could be identified. Thus, a smaller processing load and a shorter response time for semantic manipulation were elicited by the NL-NR signals relative to the FL-R stimuli. But it showed a quite small mental workload for syntactic processing of the FL-R signals though the response time to FL-R was longer compared to NL-NR. Topographic maps showed that the N400 and P600 components elicited by NL-NR were distributed in central and frontal areas symmetrically, but FL-R induced left-lateralized patterns for semantic and syntactic processing in the frontal area peaking at 584 ms and 719 ms respectively. The L-FR signals elicited larger amplitudes of N400 and P600 components, which indicated that a heavier mental load was required for semantic and syntactic manipulations relative to the NL-NR signals. This may result from the L-FR signals violating the left ear advantage for prosodic information and the right ear advantage for linguistic signals (Meyer et al., 2002;Sammler et al., 2015;Tervaniemi & Hugdahl, 2003;Vigneau et al., 2006). L-FR sent prosodic signals to the right ear and linguistic information to the left ear, which is presumably a non-optimal signal for the brain to process. As to latencies, a shorter response time elicited by L-FR was observed for semantic processing, but a longer response time was detected for syntactic manipulation compared to the NL-NR signals. Topographic maps indicated that NL-NR in the second run of the experiment elicited N400 and P600 with general symmetrical distributions in the central areas, similar to the distributions in the first run of the experiment. In addition, the N400 component elicited by the L-FR signals was distributed in the central and frontal areas, and the P600 component was detected in the centroparietal area. It is not obvious that L-FR elicited lateralization during semantic and syntactic processing.
For both-ears-filtered FL-FR stimuli, smaller amplitudes of N400 were elicited by FL-FR, indicating a small processing load for semantic manipulation. Further, smaller P600 amplitudes elicited by FL-FR were also found in the central and frontal areas, which suggested a lower mental workload for syntactic processing. This result supports the assumptions of the authors that 320 Hz low-pass filtering reduces the processing load for language signals since the filtered sounds only contain intonation and rhythm information that lightens the load for processing meanings of the signals (Asp et al., 2012;Guberina, 1972;He et al., 2015;Lian, 1980;Yang, 2016). As to the response time, FL-FR with only prosodic information sounded unusual and was an unexpected language signal for the participant, and thus, required a longer response time to process both semantic and syntactic information. Only one exception occurred in the centroparietal area where FL-FR induced a shorter response time regarding semantic processing relative to the NL-NR signals. Topographic maps suggest that N400 and P600 elicited by NL-NR presented a basically symmetrical distribution in the central and frontal areas, which was consistent with the results of distributions in the other two runs of the experiment. But the N400 component elicited by FL-FR was distributed in the occipital area and the P600 showed a slightly left-lateralized distribution in the central area.
Additionally, the environmental sounds of running water, train noises, and birds singing, used as novel stimuli in three runs of the experiment, did not elicit any language-related N400 or P600 component. It indicates that the brain distinguishes language and non-language signals successfully and processes them differently. Page 14 of 20 Cai et al. Asian. J. Second. Foreign. Lang. Educ. (2021) 6:14 fMRI data Brain activations and activation maps for processing FL-FR, FL-R, L-FR, and NL-NR stimuli are presented in Table 3 and Fig. 7. fMRI results indicate that the neural processing patterns of low-pass filtered and unfiltered language signals do differ in the four dichotic configurations of stimuli (i.e., FL-FR, FL-R, L-FR, and NL-NR stimuli) even though the sentences were the same in four runs of the experiment. Generally, 320 Hz low-pass filtered signals led to lower activation levels in the brain. The FL-FR and the NL-NR signals induced increased activation in the regions of bilateral STG, but NL-NR induced more activated regions extended to bilateral inferior parietal lobule and the right precentral gyrus, further, the midbrain, cerebellum, pons, posterior cingulate, and corpus callosum were increasingly activated by NL-NR. In general, higher levels of activation were induced by the NL-NR signals relative to FL-FR. It confirms the previous assumption that low-pass filtered language signals could lighten the listener's mental workload for processing the meanings of words (Asp et al., 2012;Guberina, 1972;He et al., 2015;Lian, 1980;Yang, 2016).
FL-FR, as a signal of speech intonation and rhythm, presents a neural processing pattern of linguistic prosody. It revealed that left MFG, bilateral STG, right IFG, and SFG were involved in prosodic processing, in addition, more frontal areas of the right hemisphere got involved. This result is consistent with the findings of the recent studies conducted by Chien et al., (2020Chien et al., ( , 2021 that Chinese (a tonal language) speakers recruit bilateral fronto-temporal regions for intonation processing. By using the unfiltered Fig. 7 Brain activation maps for a FL-FR (clusters are thresholded at p < .001, uncorrected), b FL-R (clusters are thresholded at p < .001, uncorrected), c L-FR (clusters are thresholded at p < .03, uncorrected), and d NL-NR stimuli (clusters are thresholded at p < .001, uncorrected). LH, left hemisphere; RH, right hemisphere monosyllabic Mandarin words as auditory stimuli, Chien et al., (2020Chien et al., ( , 2021 proposed that the connection between left IFG and bilateral temporal areas may reflect the phonological processing network for auditory intonation perception and prosodic categorization. However, the current study revealed stronger activation in right IFG induced by FL-FR. This may result from the stimuli used in the current study. Unlike the phonological processing of the unfiltered words, the low-pass filtered sentence signals contain only prosodic information without noticeable segmental features of the speech sounds. Thus, the FL-FR signals induced activation in the bilateral fronto-temporal areas for intonation and rhythm processing, in addition, a pathway linking posterior temporal to IFG in the right hemisphere was prominently involved in the prosodic processing (Meyer et al., 2002;Sammler et al., 2015).
The FL-R stimuli that are assumed to be consistent with ear advantage sent prosodic signals to the right hemisphere and directed linguistic signals to the left hemisphere. It revealed a left-dominant processing pattern for the unfiltered signals, in the meantime, the activated areas were smaller and the activation level was lowered by the filtered signals in the right hemisphere. Similar to FL-FR, the low-pass filtered stimuli, as discussed above, do induce lower activation levels and smaller activated areas in the right hemisphere. Whether the filtered prosodic signals were sent to the brain diotically (FL-FR) or dichotically (FL-R), low-pass filtering induced activation of the right-hemispheric STG and MFG, indicating an auditory prosodic processing pattern with a lower activation level relative to the unfiltered signal. As to the linguistic signal dichotically sent to the left hemisphere, stronger involvement of left inferior parietal lobule (supramarginal gyrus, BA 40, and posterior transverse temporal area, BA 42), Heschl's gyrus (BA 41), and MFG were detected. This result is basically consistent with the "semantic hubs" for manipulating a concept or meaning of the spoken or written language symbols (Pulvermüller, 2013;Pulvermüller & Fadiga, 2016), including left-hemispheric inferior frontal areas (Bookheimer, 2002), inferior parietal regions (Binder & Desai, 2011), anterior or posterior-middle temporal areas (Hickok & Poeppel, 2007;Patterson et al., 2007). Although left inferior frontal regions in the "semantic hubs" were not significantly activated by the FL-R signals, increased activation was observed in the middle frontal gyrus. This result may be led by the L2 auditory stimuli that MFG is related to higher-level cognitive control and essential for effective communication together with inferior frontal regions especially in a foreign language (Mårtensson et al., 2012;Sierpowska et al., 2018). Thus, FL-R induces prosodic and semantic processing patterns with lower activation levels and small activated areas in the right hemisphere. Further, the left-hemispheric semantic processing pattern and higher-level cognitive processing of foreign language signals are significantly induced by the FL-R signals.
In terms of L-FR, significant activation was shown at the p-cluster threshold < .03 rather than p-cluster < .001 in the other three stimuli. It indicated that L-FR induced a relatively lower activation level compared to the other three signals. According to ear advantage, L-FR is presumably a non-optimal language input that violates the left and right ear advantages, which may be the reason that both hemispheres are reluctant to process the signal. Activation of the primary auditory cortex in bilateral STG especially Heschl's gyrus (Da Costa et al., 2011) was revealed, but not a specific language processing pattern. In addition, greater activation was found in the corpus callosum induced by the L-FR signals, even greater than other cerebral regions. The corpus callosum, connecting both hemispheres, was identified as assisting language processing and language lateralization (Hinkley et al., 2016). As L-FR is not favored by both hemispheres, stronger activation of the corpus callosum may indicate that the left and right hemispheres redirect the signal to the other hemisphere after they receive the signal. As a result, L-FR leads to activation of the primary auditory cortex and lower activation levels in the brain. However, both hemispheres seem to send the signals away instead of processing them.

Relationship between ERP and fMRI results
ERP and fMRI results revealed distinct neural processing patterns of the FL-FR, FL-R, L-FR, and NL-NR stimuli, which supports our assumption that dichotic listening to low-pass filtered prosodic and unfiltered linguistic signals induces different processing mechanisms due to left and right ear advantages. As the English (L2) sentences as auditory stimuli in each run of the combined ERP and fMRI experiment were the same, the differences in processing patterns were led by the dichotic configurations of the filtered and unfiltered stimuli. Thus, there should be an optimal or non-optimal language signal for foreign language learners to process, which are consistent with or violate the left and right ear advantages.
Compared with NL-NR, FL-FR in ERP and fMRI experiments showed a lower mental load for processing. Amplitudes of the ERP components regarding semantic and syntactic processing elicited by FL-FR suggested that FL-FR released the processing load compared with the unfiltered NL-NR stimuli, although the response time for semantic and syntactic manipulations was longer. The reason for the longer response time may be that the filtered stimuli are limited in semantic and syntactic information and unexpected for the participant to process. fMRI results revealed lower levels of activation compared to NL-NR, in addition, a neural processing pattern for prosody was detected in the bilateral fronto-temporal areas, especially in the right hemisphere. Topographic maps obtained from the ERP experiment showed that FL-FR induced a slightly left-lateralized distribution in the central areas regarding syntactic processing, which was basically consistent with the fMRI results that stronger activation was found in the left middle frontal gyrus.
The brain was actively involved in semantic manipulation of the FL-R signals with a higher mental load and a longer response time in the ERP results relative to NL-NR. But it was syntactically easier to process FL-R with a lower mental workload. A left-lateralized distribution of the components was identified in the ERP experiment, which was confirmed in the fMRI experiment. fMRI results showed stronger involvement of the left hemisphere with prosodic and semantic processing patterns. The "semantic hubs" in the left hemisphere (Pulvermüller, 2013;Pulvermüller & Fadiga, 2016) were activated by the FL-R signals with a higher level of cognitive control of L2 signals. Meanwhile, the right hemisphere showed lower activation levels with smaller areas. FL-R, consistent with the left and right ear advantages, induced lower mental workload in the right hemisphere by sending prosodic information, at the same time, linguistic signals sent to the left hemisphere led to stronger activation for semantic processing. The higher mental load and longer response time detected in the ERP experiment may result from the foreign language signals as stimuli, which required a higher level of cognitive control due to significant activation of left MFG as observed in the fMRI experiment (Mårtensson et al., 2012;Sierpowska et al., 2018). FL-R can be identified as an optimal language input because prosodic and semantic processing patterns can be significantly activated in the left hemisphere, and the activation level or processing load in the right hemisphere was lowered. FL-R optimized the auditory language input by actively involving left-hemispheric dominance for language and lowering cognitive load in the right hemisphere for other higherorder/complex cognitive processes (Galotti, 2017;Levine, 2009).
L-FR induced a higher processing load for semantic and syntactic manipulations in the ERP experiment. Shorter response time for semantic processing and longer response time for syntactic manipulation were observed compared to the NL-NR signals. The N400 and P600 effects elicited by L-FR indicated that the brain struggled to process this signal. Different from the higher processing load detected in the ERP experiment, fMRI results suggested a quite smaller activation level relative to the other three stimuli. In addition, both hemispheres were struggling with the signals that they received and were reluctant to process, apparently redirecting the signals to the other hemisphere via the corpus callosum. As a result, stronger activation induced by L-FR was found in the corpus callosum rather than other cerebral areas. Thus, L-FR can be identified as a nonoptimal auditory language input for language learners.
The ERP study also revealed that the neural mechanisms for processing language signals and environmental sounds (non-verbal signals) were different, so environmental sounds did not trigger any language-related ERP components.

Conclusion
To sum up, this case study employed a combined ERP and fMRI experiment to identify an optimal auditory language signal for Chinese ESL learners. The FL-R signal derived from principles of verbotonalism and used in a dichotic context can be considered as the optimal signal for Chinese ESL learners as it is demonstrably consistent with the left and right ear advantages. Specifically, FL-R induced active prosodic and semantic processing of the linguistic signals in the left hemisphere and, in the meantime, lowered the processing load in the right hemisphere. The optimized FL-R language input signal appears best suited for optimal processing. As a result, FL-R should, in principle, enable learners to make better sense of foreign language signals and to facilitate language learning (Guberina & Asp, 1981Lian, 2004). In addition, the current study also identified a non-optimal language signal L-FR, which should be avoided in learning or teaching a language. While the authors acknowledge that this is a limited case study requiring additional confirmation, the current study finds its origins in a rater-based study. Results showed that the group of participants listening to the FL-R signals during an ESL course outperformed the L-FR and NL-NR groups in English pronunciation, especially in intonation and fluency performance. At the same time, the L-FR group achieved the least improvement. As a result, it is concluded that there does exist an optimal language input (FL-R) and a non-optimal input (L-FR) for foreign language learners. A clear pedagogical implication is that ESL learners (perhaps others too) would benefit from dichotic FL-R input while learning English. Future studies should recruit a larger number of participants to generalize the findings.