Age-Related Differences in Auditory Statistical Learning of Complex Speech Streams: Evidence From ERPs

Statistical learning (SL), the ability to pick up regularities in the sensory environment, is a fundamental skill that allows us to structure the world in a regular and predictable way. Although extensive evidence has been gathered from children and adults, the changes that SL might undergo throughout development remain contentious, particularly in the auditory modality (aSL) with linguistic materials. Here, we collected Event-Related Potentials (ERPs) while 5 years-old children and young adults (university students) were exposed to a complex speech stream in which eight three-syllable nonsense words - four with high (1.0) and four with low (.50) transitional probabilities were embedded to further examine how aSL works under less predictable conditions and to enhance age-related differences in the neural correlates of aSL. Moreover, to ascertain how previous knowledge of the to-be-learned regularities might also affect the results, participants performed the aSL task, rstly, under implicit and, subsequently, under explicit conditions. Although behavioral signs of aSL were observed only for adult participants, ERP data showed evidence of aSL in both groups, as indexed by modulations in the N100 and N400 components. A detailed analysis of the neural responses suggests, however, that adults and children rely on different mechanisms to assist aSL.

This occurs, at least in part, because early works in SL, as well as in the implicit learning (IL) related eld [see 12 for a discussion], claimed that SL/IL is an early-maturing ability that remains quite stable across development as no differences in performance had been observed between children and adults [e.g., 8,13,14 ]. However, a growing body of research conducted in the last decade has challenged this view by showing that SL/IL improves with age [see 15,16 for reviews]. Moreover, recent works suggest that the developmental trajectory of this ability might not be the same across sensory modalities and types of stimuli [e.g., 5,6,17 ]. For instance, Raviv and Arnon 5 , using auditory syllables and visual gures in visual and auditory SL tasks (vSL and aSL, respectively) modeled from Saffran et al's. 1 pioneering work, showed that while vSL improved from 5-to-12 years-old children, aSL did not, which could account for the mixed results in the literature showing age differences for vSL [e.g., 3,18 ], but not for aSL [e.g., 8 ; see however, Emberson et al. 4 for SL improvements in both modalities]. Nevertheless, in a subsequent work, Shufaniya and Arnon 6 showed that the absence of age differences in the aSL was not due to the sensory modality per se but rather to the type of stimuli. Indeed, using non-linguistic auditory materials (i.e., familiar sounds) instead of auditory syllables, the authors found that vSL and aSL skills improved in the same age range. These ndings suggest that SL is not age-invariant, as claimed by earlier works, except for auditory linguistic materials, which is also consistent with other works claiming that infancy is a privileged time for language acquisition [e.g., [19][20][21][22], as adults are not better than children at learning new languages due, probably, to competition with other mechanisms that appear later in development [see [23][24][25] ].
It is worth noting, however, these ndings were obtained using a two-alternative forced-choice  task, which has been subject to increasing criticism [see 3,15,[26][27][28] for an extended discussion]. Indeed, in this task participants are typically asked to identify which of two three-syllable nonsense words (e.g., 'tokibu' vs. 'kipopi') mostly resembles the ones heard in a previous familiarization phase in which, unbeknownst to them, syllables are grouped into triplets that always appear together in the stream (e.g., 'tokibutipolugopilatokibu'). If performance is greater than chance, SL is assumed to have occurred, as only the track of the statistical regularities embedded in the auditory stream presented during the familiarization phase allows a correct 'word' discrimination. Despite the widespread use of this task to test aSL, it is important to emphasize that it relies strongly on explicit judgments, which depend largely on other high-order cognitive skills (e.g., decision-making processes) that could be not fully developed in children of this age [see 29 for similar arguments]. Thus, further research using other tasks and/or methodologies is required to get a deeper understanding of how aSL with linguistic materials might change throughout development. Moreover, it is also important to note that most studies tested aSL using three-syllable nonsense words presenting the same level of predictability, i.e., transitional probabilities (TP) of 1.00, which might also contribute to limit not only our understanding of the conditions under which aSL works, but, importantly, the chances of age-related differences in aSL to be observed [see, however, 9,30-33 for works using triplets with different TPs]. Note that a triplet with a TP of 1.00 means that a given stimulus (e.g., a syllable) is always followed by another stimulus (e.g., another syllable) in the stream, thus reducing the variance along with the aSL ability can be measured. Besides, as Soares et al. 9 recently pointed out, removing all signs of uncertainty from the input does not mimic what occurs in natural languages, which also reduces the possibility of aSL results to be related with other linguistic outcomes. Indeed, in natural languages, syllables, as well as other linguistic units (e.g., phonemes, morphemes) do not follow each other with 100% of certainty, as the same syllable can occur in many different words at different syllable positions such as the syllable /cur/ in words as /cur.va.ture/, /in.cur.sion/ or /re.oc.cur/. Additionally, the 2-AFC task is an o ine post-learning task, measuring the result of the learning that has occurred in the previous familiarization phase, and not the processes underlying that result [see 9,28,34 for an extended discussion]. Thus, it is possible that even though children and adults might not differ in terms of aSL outcomes with linguistic materials, they might nevertheless differ in the mechanisms they recruit to reach those outcomes.
This work aimed to directly analyze these issues by examining the electrophysiological correlates (ERP) elicited during the familiarization phase of an aSL task with linguistic materials (syllables) while 5 yearsold children and young adults (university students) were exposed to the repetition of eight three-syllable nonsense words -four with TPs of 1.00 (high-TP 'words') and four with TPs of .50 (low-TP 'words')presented in two different blocks to further examine how ERPs might change as a function of words' predictability and length of exposure. Furthermore, to examine the role that previous knowledge of the to-be-learned-regularities might play on aSL -an issue that is gaining increased interest given the proposal that previous linguistic knowledge might interfere with SL outcomes [see 35,36 ] -participants performed the aSL task, rstly, under implicit and, subsequently, under explicit conditions to minimize the role of individual differences in performance [see 32 and also 9 for a similar procedure]. Behavioral data were also collected from the standard 2-AFC task presented after the familiarization phase of each of the aSL tasks.
Previous studies analyzing the electrophysiological correlates of aSL during the exposure to the continuous stream showed the N100 component and particularly the N400 component as indexes of words' segmentation and of the emergence of a pre-lexical trace of 'words' in the brain [e.g., 9,[37][38][39][40][41][42][43]. It is relevant to note that the vast majority of previous ERP studies did not provide information regarding the changes that the neural correlates of aSL might undergo as the exposure to the input stream unfolds [for exceptions, see 9,34,37,41 ]. This is particularly important because children and adults might differ in the amount of exposure they need to unravel the statistical structure of the input, showing potential neurofunctional differences that might not be detected unless the length of exposure is taken into account. Therefore, if children and adults differ in the processes underlying aSL with linguistic materials, even though differences in the 2-AFC performance might not be observed as in previous studies [e.g., 5,6,8 ], we expect modulations in the N100 and N400 ERP components to be observed across groups of participants for the high-and low-TP 'words' under implicit and explicit conditions as exposure to the speech stream unfolds.

Behavioral
The mean percentages of correct responses in the implicit and explicit 2-AFC tasks for the high-and low-TP 'words' in each group of participants are presented in Table 1.
The results from the one-sample t-tests against chance level in the group of children showed that the 2-AFC performance did not differ from chance in either of the aSL tasks and type of 'words' (all ps > .115). In the adult group, the results showed that 2-AFC performance exceeded the chance level for the low-TP 'words', t(20) = 2.264, p = .015 in the implicit condition, and for the high-TP, t(20) = 2.592, p = .017, and low-TP 'words', t(20) = 3.543, p = .002 in the explicit condition. These ndings indicated that, conversely to children, adults showed behavioral signs of SL in both aSL tasks and type of 'words' except for the high-TP 'words' in the implicit condition.
ERP Data N100 Children. In this ERP component, the ANOVA showed a main effect of block, maximal at the fronto-central ROI, F(1,19) = 5.22, p = .034, η p ² = .215 indicating that regardless of the aSL task and type of 'word', children showed a larger N100 amplitude in the second half than the rst half of the aSL tasks (see Figure 3). No other main or interaction effects reached statistical signi cance.
Adults. Maximal effects were observed at the central ROI in this ERP component. The ANOVA showed a main effect of aSL task, F(1,20) = 10.58, p = .004, η p ² = .346, indicating an enhancement in the aSL task performed under explicit compared to the implicit condition. A main effect of block was also observed, F(1,20) = 5.16, p = .034, η p ² = .205, indicating, as in the case of children participants, a larger N100 amplitude in the second half than in the rst half of the aSL tasks ( Figure 3). In addition, the two-fold aSL task * type of 'word' interaction reached a marginally statistical signi cant level, F(1,20) = 4.31, p = .051, η p ² = 177. In this interaction, the effect of task was found for the high -TP 'words', showing larger N100 amplitudes in the aSL task performed under explicit than implicit condition (p = .001). In addition, the effect of type of 'word' was found under the explicit condition, with larger N100 amplitude for the highthan for the low-TP 'words' (p = .039). Figure 1 depicts that effect.

depicts this interaction.
Pairwise comparisons showed that the effect of aSL task resulted in an enhancement of the N400 component under explicit than implicit condition, observed for low-TP 'words' in the rst half of the task (p = .030), while in the second half of the task that effect was observed for high-TP 'words' (p = .027).
Moreover, a signi cant effect of type of 'word' was found in interaction with instructions and block effects, showing a larger amplitude for low-TP 'words' than high-TP 'words' in the rst half of the explicit task (p = .041). Finally, the effect of block reached signi cance for low-TP 'words' under explicit instructions, resulting in a larger N400 amplitude in the rst half than in the second half (p = .022).
Adults. Adults showed a signi cant main effect of type of 'word' at central ROI, F(1,20) = 6.88, p = .016, η p ² = .256, showing a larger N400 for the high-TP than for the low-TP 'words' regardless of the aSL task ( Figure 2). Moreover, a main effect of block was also observed, F(1,20) = 8.15, p = .010, η p ² = .289, indicating an enhancement in this component in the second than in the rst half of the aSL tasks ( Figure   2). No other main or interaction effect reached statistical signi cance.

Discussion
The present study aimed to examine age-related differences in the ERP correlates of aSL with linguistic materials. Five-years old children and young adults (college students) were exposed to a speech stream in which the statistical regularities of three-syllable nonsense words, with high-and low-predictability, had to be extracted through passive exposure (implicit condition) or after the nonsense words had been explicitly taught to the participants (explicit condition). With this design we aimed to further analyze if children and adults rely on the same core mechanisms to extract word-like units from the exposure to a more complex speech stream, and, ultimately, to shed light on the changes that SL might undergo throughout development since the few behavioral studies conducted so far have led to inconsistent results. Behavioral data were also collected from a standard 2-AFC task performed after the familiarization phase of each aSL task.
The behavioral data showed that the 2-AFC performance only differed from chance for adult participants in both aSL tasks and for both types of 'words', except for the high-TP 'words' in the aSL task performed under implicit conditions. In the group of children, performance did not differ signi cantly from chance regardless of words' predictability and the conditions under which they were presented. The absence of reliable signs of SL in the group of 5-years old children is not new. In Raviv and Arnon 5 , and also Shufaniya and Arnon's 6 works, the authors did not found evidence of SL for children under 6 years of age, which led them to exclude these children from the analyses to avoid the age differences that simply re ected the move from a chance to an above-chance 2-AFC performance [see also 17 ] for similar ndings with the arti cial grammar learning paradigm). In our case, the absence of reliable signs of aSL in the 2-AFC tasks performed by children might also stem from the complexity of the speech stream used, which entailed not only a higher number of 'words' (eight) than used in previous works (typically from four to six 'words'), but also 'words' that were more diverse on their composition. Indeed, against the vast majority of aSL studies conducted so far, which only used 'words' with high levels of predictability (TPs = 1.0), here we have also used 'words' with lower levels of predictability (TP = .50), which might have contributed to make the extraction of the statistical regularities embedded in the stream harder to achieve [see 9,35 ].
Moreover, the joint presentation of high-and low-TP 'words' in the same auditory stream might also have contributed to make the extraction of the regularities embedded in the input much more challenging as 'words' could be made up, or not, of unique syllables, which could have provided con icting cues for 'words' segmentation.
Nevertheless, in the adults' group, the behavioral results indicate a moderate, but reliable, level of learning in both aSL tasks although, interestingly, performance was better for the low-than for the high-TPs 'words'. This unexpected result can be accounted for if we attend to an inevitable consequence that the TPs manipulation in our study, as well as in all other studies that used 'words' with different levels of predictability [e.g., 9,32,35 ], entails. Indeed, even though high-and low-TP 'words' were presented exactly the same number of times during familiarization to control for 'word' frequency effects, the fact that high-TP 'words' comprise unique syllables -unlike low-TP 'words', whose syllables occurred in different 'words' in different syllable positions -made the learning of the low-TP 'words' to involve not only the encoding of a smaller number of syllables than the high-TP 'words' (4 vs. 12, respectively) but, importantly, syllables that occurred three times more frequently than the syllables of the high-TP 'words'. The difference in the number of the different syllables and in the number of times each syllable appeared in the stream might have led participants when asked to decide which of two stimuli 'sounded more familiar' based on the stream presented before, to choose the low-TP 'words' against other possible candidates, as the low-TP 'words' were made of syllables that occurred more often in the stream, generating certainly a higher level of familiarity.
The absence of signi cant learning effects in the group of children, even when explicit instructions about the to-be-learned regularities were presented before exposure, converge with other works claiming that the 2-AFC task is not well-suited to test SL, particularly in children of this age [see 15,29 ]. As mentioned in the introduction section, the 2-AFC task requires participants to make explicit judgments about regularities that are expected to be acquired implicitly, which might be hard to accomplish by 5-years old children.
Note that in each 2-AFC trial participants have to process two stimuli (a 'word' and a foil) that were very similar and to decide which one belongs to the arti cial language they heard before. This requires working memory capacities that are still under development at this age [see 44 ] and that might also justify why adults, but not children, seem to take advantage of the previous knowledge of the to-be-learned regularities to boost 2-AFC performance [e.g., 3,45 ].
The ERP data showed, however, modulations in the N100 and N400 components, taken as the neural signatures of SL in the brain [e.g., 37,40,46 ] in both groups of participants. Speci cally, the ERPs recorded during familiarization showed larger amplitudes in the N100 during the second half of the aSL tasks relative to the rst, in both adults and children, suggesting this ERP component to be sensitive to the amount of exposure to the statistical structure embedded in the auditory stream. Previous research has considered the N100 a putative 'marker' of online segmentation in the brain [e.g., 37,46,47 ], but the literature still presents divergent ndings regarding how N100's amplitude is modulated by speci c factors [e.g., 9,38,40,48 ]. Our ndings are in line with previous research showing enhancements in the N100 in the last part of the familiarization phase [see 37 ], and suggest the N100 to index transient effects that change as learning progresses. More importantly, it also suggests that an early brain mechanism of SL is already present in 5 years-old children for the decoding of linguistic input, which agrees with works claiming that SL is an early-maturing skill supporting language acquisition [e.g., 1,8 ; see 49 for a review], even though behavioral evidence of SL might be only noticed at a later developmental stage, at least when using the 2-AFC task, which requires the maturation of other cognitive skills.
Additionally, in adults, we found evidence of a larger N100 when subjects were provided with prior knowledge about the 'words' they were exposed to during familiarization. Given the early, and sensory, nature of this neural component, this might indicate that explicit conditions facilitate the extraction of the regularities, particularly for those 'words' presenting high-TPs, triggering an enhancement in the this ERP component. This result is also consistent with other ndings indicating that the development of executive functions could interfere with aSL results [e.g., [23][24][25] ]. Since, behaviorally, we found aSL to be increased when adults were provided with explicit instructions, we might as well consider the N100 as a highly sensitive neural proxy of aSL. In children, we did not nd evidence that the amplitude of the N100 was modulated by previous knowledge about the 'words', only by mere exposure to the stream. However, ERP data showed that explicit instructions have in uenced later stages, which could indicate that children's developing brains would use such information in further processing. In fact, the effects found on N400, related to the activity of specialized language circuits, seem to support this idea.
In the N400 component, we also found larger amplitudes for high-than low-TPs 'words' in the group of adult participants regardless of the aSL task performed and amount of input exposure. This result replicates the one of Soares et al.'s 9 work and suggests that the N400 component can be taken as a reliable index of the emergence of a pre-lexical trace of 'words' in the brain, as modulations in this component re ect the degree to which representations of linguistic sequences are being integrated into memory and language-related brain circuits [see 50 ]. The fact that high-TPs 'words' elicited larger N400 amplitude indicates not only that this type of linguistic sequences is more easily extracted from the input [see also 9,35 for similar conclusions], but it also corroborates the brain's ability to effectively decode the structure of continuous streams of syllables, distinguishing highly probable from less probable sequences, even when 'extra' (metalinguistic) information about the to-be-learned regularities was not provided.
In contrast, the N400 in children was found enhanced by the effect of explicit instructions. Under explicit conditions, the 'word' predictability effect showed that low-TPs 'words' elicited larger N400 amplitudes than high-TP 'words' in the rst part of the aSL task, contrary to what was found for adult participants.
This nding suggests that, unlike adults, children seem to guide the extraction of the statistical regularities embedded in the input by computing the frequency with which each syllable occurred in the stream rather than syllables' TPs. Indeed, the fact that high-TP 'words' entailed a higher number of syllables, that occurred three-times less often in the stream than low-TP 'words', might have make the immature cognitive system to use a more 'economic' strategy (syllable frequency instead of syllables' TPs) to predict the upcoming events, and to use that knowledge to facilitate the learning of lower frequency elements later on [see 36, 51,52 ]. The fact that during the rst half of the aSL task presented under explicit conditions children showed larger N400 amplitudes for low-TP 'words', and during the second half of the same task larger N400 amplitudes for high-TP 'words' seem to be in accordance with this rationale, although future research should further explore this issue by manipulating the use of different statistics (conditional vs. distributive) in the input.

Conclusion
The present study is, to the best of our knowledge, the rst reporting ERP evidence of age-related differences in the mechanisms used by children and adults to extract word-like units in the context of a much more complex and diverse speech stream that mimicked closely what occurs in 'real' environments. It highlights the usefulness of the ERP methodology to cope with the limitations of the post-learning behavioral methods to test SL (particularly the 2-AFC task with children under 6 years of age). It also sheds light on how the mechanisms underlying SL might change, not only in different sensory modalities, as previous authors have already highlighted [e.g., 4,-6,8 ], but importantly along different types of statistical predictability. Our results also support the idea we have been claiming in line with other authors [e.g., 34 ] that online and o ine/behavioral SL measures tap into distinct cognitive processes, the rst related to how the brain computes statistical regularities embedded in the sensory input, and the second with how people retrieve patterns learned from memory. Future research should thus incorporate online measures to provide new insights on how the cognitive system segments complex speech streams and takes advantage of different regularities embedded in the sensory input at different developmental stages.

Method Ethics Statement
The study was carried out in accordance with the guidelines of the Declaration of Helsinki and approved by the ethics committee of the local Ethics Committee (University of Minho, SECSH 028/2018). Written informed consent was obtained from each adult participant and from parents/legal representatives in the case of children participants.

Participants
Twenty-four children (13 female, M age = 5;7; range 5;1 to 6;5) from Portuguese kindergarten institutions, and twenty-four students (22 female, M age = 20;3; range 18;1 to 31;2) from the University of Minho participated in the study. All participants were native speakers of European Portuguese, with normal hearing and no reported history of learning or language disabilities and/or neurological problems. All were right-handed, as assessed by the Portuguese adaptation of the Edinburgh Handedness Inventory 53,54 .

Stimuli
Thirty-two auditory European Portuguese CV syllables (e.g., 'tu', 'ga', 'ci', 'pa', 'be') evenly distributed over two syllabaries (syllabary A and syllabary B) were used from Soares et al.'s 9 study to create the 16 nonsense words used in the implicit and explicit versions of the aSL tasks. Syllables were produced and recorded by a native speaker of European Portuguese with a duration of 300 ms each. Syllables in each syllabary were organized into eight three-syllable nonsense words: four high-TP 'words' (TP = 1.0), and four low-TP 'words' (TP = .50). For instance, the nonsense word 'tucida' from syllabary A and the nonsense word 'todidu' from syllabary B correspond to high-TP 'words' as the syllables they entail only appear in those 'words' and in that speci c syllable positions, while the nonsense word 'dotige' from syllabary A and the nonsense word 'pitegu' from syllabary B correspond to low-TP 'words' as the syllables they entail appear in three different 'words' at different (initial, medial, nal) syllable positions ('tidomi', 'migedo', and 'tepime', 'megupi', respectively -see Table 2 for other examples).
The nonsense words were concatenated in a continuous 8.4 min stream with the Audacity® software (1999-2019) with a 50 ms interval between syllables. Each nonsense word was repeated 60 times in six blocks of 10 repetitions each (1.4 min per block -see Figure 4 ahead). In each block, the nonsense words were presented binaurally in random order with the restriction that the same nonsense word or the same syllable will never appear consecutively. TPs across 'word' boundaries were .14 for the high-TP 'words' as any of these 'words' could be followed by any other of the remaining seven 'words', and .17 for the low-TP 'words' as any of these 'words' could be only followed by any of six possible 'words' to prevent cases such as 'tidomi' and 'migedo' to occur successively in the stream. The speech stream was edited to include in ~10% of the syllables a superimposed chirp sound (a 0.1 s sawtooth wave sound from 450 to 1,450 Hz) to provide participants with a cover task (i.e., a chirp detection task) that ensured adequate attention to the auditory stimuli as in previous studies [e.g., 2,9 ].
For the 2-AFC task, eight foils of three syllables each were also created for each syllabary (see Table 2). The foils were made up of the same syllables used in the 'words', presented with the same frequency and syllable positions as in the high-and low-TP 'words'. For example, the most frequent syllables used during familiarization from syllabary A (e.g., 'do', 'ti', 'mi', 'ge'), which appeared three times in different low-TP 'words' (e.g., 'dotage', 'tidomi', 'migedo', 'gemiti'), were also presented three times in the foils (e.g., 'dobage', 'tidemi', 'mipedo', 'geciti'), whereas the less frequent syllables (e.g., 'tu', 'ci', 'da', 'bu', 'pe', 'po') which appeared only once in the high-TP 'words' (e.g., 'tucida', 'bupepo', 'modego', 'bibaca'), were also presented once in the foils (e.g., 'tumica', 'bugego', 'modopo', 'bitida'). However, conversely to the syllables in the high-and low-TP 'words', the syllables in the foils were never presented together during familiarization (TPs = 0). Note, however, that due to stimuli restrictions (number of syllables in each syllabary and the need to generate sequences of syllables never presented together before), the foils associated with the high-TP 'words' entailed two syllables from the high-TP 'words' and one-syllable from the low-TP 'words'. The same is observed for the foils associated with the low-TP 'words' that entailed two syllables from the low-TP 'words' and one syllable from the high-TP 'words'. Four lists of materials were created to counterbalance syllables across positions and stimuli in each syllabary. Participants in each group were randomly assigned to one list from syllabary A and one list from syllabary B to perform the aSL under implicit and explicit conditions with the constraint that the same number of participants would complete a given list (six participants per list).

Procedure
Data were collected in a shielding cabin at the Psychological Neuroscience Lab (School of Psychology, University of Minho). Participants were rstly presented with the implicit version of the aSL task, and, subsequently, with the explicit version of an analogous aSL task (see Figure 4). In the implicit version, participants were instructed to pay attention to the auditory stream (sequences of syllables) presented at 60 dB SPL via binaural headphones, because occasionally a deviant sound (i.e., a click) would appear, and their task would be to detect it as soon and accurately as possible by pressing the spacebar from the computer keyboard (i.e., to perform a target detection task) as in previous works. Following familiarization, participants were asked to decide as accurately as possible which of two auditory stimuli (one 'word' and one foil) 'sounded more like' the stimuli presented before (i.e., to perform a 2-AFC task). The 2-AFC comprised 16 trials in which each of the 'words' were paired with two different foils.
After a brief interval, participants underwent the explicit version of the aSL task. Again, participants were exposed to a continuous stream made of other eight 'words' (four high-TP and four low-TP 'words') from the syllabary not used in the implicit version of the aSL task. The explicit version followed the same procedure, except that previously to the familiarization phase participants were presented with a training phase in which each of the eight new 'words' was presented individually and participants were asked to repeat each of them correctly before the familiarization phase began [see 9 for details]. The procedure took about 90 min to be completed per participant. Figure 4 depicts a visual summary of the experimental design.

EEG Data Acquisition and Processing
Data collection was performed in an electric shielded, sound-attenuated room. Participants were seated in a comfortable chair, one meter away from a computer screen. During the familiarization phase, EEG data was recorded with a 64 channels BioSemi Active-Two system (BioSemi, Amsterdam, The Netherlands) according to the international 10-20 system and digitized at a sampling rate of 512 Hz. Electrode impedances were kept below 20 kΩ. EEG was re-referenced off-line to the algebraic average of mastoids. Data were ltered with a bandpass lter of 0.1 -30 Hz (zero phase shift Butterworth). ERP epochs were time-locked to the nonsense words' onset, from -300 ms to 1,200 ms (baseline correction from -300 to 0 ms). Independent component analyses (ICA) were performed to remove stereotyped noise (mainly ocular movements and blinks) by subtracting the corresponding components. After that, epochs containing artifacts (i.e., with amplitudes exceeding +/-100 µV) were excluded. EEG data processing was conducted with Brain Vision Analyzer, version 2.1.1. (Brain Products, Munich, Germany).

Data Analysis
Behavioral and ERP data analyses were performed using the IBM-SPSS® (Version 27.0). For behavioral data, the percentage (%) of correct responses was computed for each of the 2-AFC tasks and separately for the high-TP and low-TP 'words' in each group participants (coded as 1 for a correct and 0 for an incorrect response). One-sample t-tests against the chance level were conducted in each group of participants to determine whether performance in each aSL task and type of 'words' was signi cantly different from chance (50%). A mixed analysis of variance (ANOVA) using Group (children vs. adults) as a between-subject factor, and aSL task (implicit vs. explicit) and Type of 'word' (high-TP vs. low-TP) as within-subject factors was conducted to analyze if 2-AFC performance was signi cantly different across groups and experimental conditions. Individual ERPs were averaged separately per condition. Grand averages waveforms were then calculated across individuals in each aSL task (implicit and explicit), Type of 'word' (high-TP vs. low-TP), and Block, i.e., considering the rst half (block#1, block#2, block#3) and the second half (block#4, block#5, block#6) of the aSL task. Seven participants (four children and three adults) were excluded from the EEG analyses (and also from the behavioral analyses) due to artifact rejection. Based on previous literature, mean amplitudes were measured for the following time windows taken as the neural signatures of words' segmentation and the emergence of a pre-lexical trace of 'words' in the brain: 80-120 ms (N100 component) for both groups, 400-500 ms and 350-450 ms (N400 component) for the group of children and adults, respectively. We chose a slightly later time window for the children group since data inspection revealed a longer latency of the N400 component. This delay of the N400 component in children has already been described in the literature and considered a normative evolutionary phenomenon [see [55][56][57] ]. To account for the topographical distribution of the abovementioned EEG de ections, mean amplitudes' values were obtained for the topographical regions where amplitudes were maximal: the fronto-central region of interest (ROI; F1, Fz, F2, FC1, FCz, FC2, C1, Cz, and C2) for N100 in children, and the central ROI (FC1, FCz, FC2, C1, Cz, C2, CP1, CPz, and CP2) for the rest of the cases.
Both for behavioral and ERP data, main or interaction effects that reached statistical or marginal signi cance levels in comparisons of interest are reported. The Greenhouse-Geisser correction for nonsphericity was used when appropriate. Post-hoc tests for multiple comparisons were adjusted with Bonferroni correction. Measures of effect size (Eta squared, h p 2 ) and observed power (pw) for a single effect are reported in combination with the main effects of condition.

Declarations Data Availability Statement
The datasets generated and analysed in the current study are available at Soares  Values of the topographical images range from -0.5 to 1 µV in adults and from -3 to 3 µV in children. "IMP" stands for the aSL task performed under implicit conditions, whereas "EXP" for the aSL performed under explicit instructions. Grey shadowed rectangles indicate the analyzed time windows.  Visual Summary of the Experimental Design Note: Panels A to G illustrate the timeline of the experimental procedure in which the implicit and, subsequently, the explicit aSL task were administered. Each aSL task comprised three parts: instructions, familiarization phase, and test phase. Each task was initiated with speci c instructions (Panels A and E) that determined the conditions under which the aSL task was performed: A. implicit instructions (i.e., without knowledge of the stimuli or the structure of the stream) or