Participants
We recruited 50 native Spanish participants with a low-to-mid level of English knowledge, aged 18–40. Participants were recruited following an individual interview with an expert linguist, who assessed their English level and assigned marks from 1.0 to 5.0 (1.0 = low; 5.0 = native-like). In the interview, fluency, vocabulary, grammar, and pronunciation were evaluated, and then combined into an overall mark. We only recruited participants who obtained an overall mark between 1.0 and 3.0 (NDS group: Mmark = 1.8, SD = 0.45, NNDS group: Mmark = 1.9, SD = 0.32). The participants were randomly assigned to one of two groups (25 participants each), exposed to either NDS or NNDS (NDS group: Mage = 26.76 years, SD = 6.55, Male = 3; NNDS group: Mage = 27.36 years, SD = 6.48, Male = 3). In addition, at the end of the experimental session, participants were asked to carry out a Raven matrices test and a pseudoword repetition task in Spanish, used as indices of participants’ non-verbal IQ and phonological memory (Clark et al., 2012; Kaufman & Kaufman, 2014; see Appendix 4 for a description of these tasks). All participants signed an informed consent form before starting the experimental procedure, and the study was approved by the Basque Center on Cognition, Brain and Language (BCBL) Ethics Committee and conducted in accordance with the relevant guidelines and regulations. Participants were paid 20 euros for taking part in the study.
Bayesian analyses showed that the two groups did not significantly differ in age, English proficiency, non-verbal IQ, and phonological memory. Two-tailed analyses with Cauchy prior distribution (scale of γ = 0.707) revealed that age, proficiency, IQ, and phonological memory of the two groups were respectively (Bayes factors, BF01) 3.39, 2.25, 3.05, and 3.53 times more likely under the null than the alternative hypothesis.
Material
Empirical evidence on the realisation of vowels other than /a/,/i/,/u/ (e.g., /ɪ/, /ʌ/, /æ/) in NNDS is limited in the literature. For this reason, we first ran a pilot study to assess matrices of NNDS adaptation on /ɪ/, /ʌ/, /æ/ vowels. We recruited five native speakers of English (British accent), who were (or had been) teachers of English with Spanish speaking students. We report the results and description of this preliminary study in Appendix 1. Below, the materials used in the three tasks are described.
Recognition task and Production task. For the present study, we created 16 novel words containing the /i-ɪ/ (e.g., [di:st - dɪst]) and /ʌ-æ/ contrasts (e.g., [gʌk – gæk]). The novel words for both vowel contrasts were minimal pairs, so that participants had to rely on the target vowels to distinguish the words. To increase item variability, we also created 8 novel words containing the /a/ and /u/ vowels (not forming minimal pairs) that served as fillers (e.g., [pha:g – fu:n]; see Appendix 2 for the full list of experimental stimuli). A set of 24 novel objects was selected to match the 16 target novel words and 8 filler words. The images were taken from the Horst & Hout (2016) novel object database and represented unknown objects and unfamiliar tools. To create the object-word pairings while avoiding any effects derived from specific relations between words and objects in our stimuli, we created 3 lists of pseudo-random associations, and the presentation of these word-object lists was counterbalanced across participants.
The stimuli were recorded by a female native speaker of British English. This speaker was chosen from the 5 speakers who participated in the pilot study as best representing the observed preliminary results (see Appendix 2; wider vocalic area, longer sentence duration, larger /ʌ-æ/ Euclidean distance and /i-ɪ/ duration difference). This speaker produced novel words in NNDS with wider vocalic area (+ 187%), longer sentence duration (MNNDS= 3640ms, MNDS= 3561ms), greater /ʌ-æ/ Euclidean distance (MNNDS= 358.10 Hz2, MNDS= 161.96 Hz2, and larger /i-ɪ/ duration difference (MNNDS =15ms, MNDS= 4ms) than in NDS. Conversely, she produced smaller /i-ɪ/ Euclidean distance in NNDS than NDS (MNNDS= 933.88 Hz2, MNDS= 1169.25 Hz2). All stimuli were normalised for intensity and used in both the Recognition and the Production task.
Continuum discrimination task. A female native speaker of British English, who did not record the stimuli for the other tasks, was recorded while producing the words sheep, ship, cup, cap. These recordings were used to create two seven-step continua. The sheep-ship continuum was created by gradually changing the formants and the length of the target vowels. The cup-cap continuum was created by solely changing the formants of the target vowels as this contrast is not marked by vowel duration (Escudero, 2001, 2006). Based on the continua, we created 7 isolated instances of words from sheep to ship and from cup to cap that were used in this task.
Procedure
The experiment was administered online via PennController for Ibex (Zehr & Schwarz, 2018), which is a JavaScript-based platform. During the session, participants remained connected with the experimenter via Zoom™, but video streaming was always disabled. This allowed the experimenter to verify that participants’ microphone worked properly and that they stayed focused on the task, without the participants feeling observed during the session. We asked participants to wear headphones and a head-mounted microphone if available, but any type of microphone with acceptable quality was allowed. Before the start of the experiment, participants recorded and played back their own voice to self-check audio quality. Participants’ compliance was confirmed using a screening test (Woods et al., 2017). After that, the experimental session followed this order: Continuum discrimination task (pre-test), Familiarisation phase, Recognition task, Production task, Continuum discrimination task (post-test), Raven matrices test, Pseudoword repetition task.
Continuum discrimination. The task began by displaying two images on the screen, one at a time (either a sheep and a ship or a cup and a cap, in counterbalanced order across participants). For each image, participants were presented with an auditory recording of the image’s name pronounced in NDS. Then, the task started, and participants used the mouse to click a button on the centre of the screen to listen to the stimuli. They were presented with one sound of the continuum at a time (in a random order). The two pictures previously displayed (a sheep and a ship or a cup and a cap) were presented on the screen and participants were asked to click on the picture corresponding to the word they heard. Each endpoint and mid-step word (7 in total) were repeated 6 times (42 trials per contrast). After completing the block corresponding to the first two images (e.g., sheep and ship), the same procedure was followed for the other minimal pair (e.g., cup and cap). Both pre-test and post-test followed the exact same procedure.
Familiarisation phase. The object-word pairs were presented once during this phase. Participants were exposed to the novel objects presented together with the auditory version of their name, embedded in a carrier phrase (e.g., “this is a deest”). The images of the objects were presented and after 250ms the phrase containing the label was played. Next, a button appeared on the screen and the participants clicked on it to proceed to the next object. Each sentence was pronounced in either NNDS or NDS, depending on the participants’ group allocation. Target and filler novel words were presented in a random order and no response was required by participants (passive learning task).
Recognition task. Participants saw images of 4 objects on the screen and heard a sentence used in the familiarisation phase (e.g., “this is a deest”). The 4 objects comprised the target object (e.g., the referent of deest), a competitor (e.g., which served as a referent of dist on another trial) and two distractors (e.g., which served as referents of gack and phoon on other trials). Participants used the mouse to click a button on the centre of the screen to hear the cue-sentence. Then, the objects were displayed on the screen until participants provided a response by clicking on one of the 4 objects. As soon as they did so, all the objects disappeared and the correct one was displayed on the centre of the screen for 2500ms. This provided feedback on the correct answer to participants. Each block included 24 trials (16 experimental trials + 8 fillers) and participants were exposed to 6 blocks in a row (total number: 96 target trials + 48 fillers). In this way, each block served both as a test and further training of the novel words. Stimuli presentation was pseudorandomized to prevent the same target vowel appearing more than twice in a row.
Production task. Participants were presented with the same 24 objects from the recognition task. The objects were displayed one at a time on the screen and the participants were asked to name each of them. As soon as an object was displayed on the screen, the browser started recording participants’ oral responses. The object remained on the screen until participants clicked the button ‘Send your response’. The microphone continued recording for 500ms after the response was sent to avoid any responses being trimmed by an early button press. After sending their response, participants heard the novel word embedded in the carrier phrase, as in the recognition task, which served as feedback. Then, the next trial began by displaying a new object on the screen. This procedure was repeated until all the object-word pairs were presented (in random order) and repeated in 6 consecutive blocks (96 target trials + 48 fillers).
Measures and statistical analysis
Recognition task. For this task we extracted 1) response accuracy across the 6 blocks. Offline, scores of 0 and 1 were assigned respectively to incorrect and correct responses. We also measured 2) response latencies across blocks. Latencies were measured from the moment the cue-sentence finished playing to the moment participants provided an answer. Only correct answers were included in the latency analysis.
Production task. We measured 1) response accuracy across blocks. As a measure of production accuracy, we computed the Aline distance between each participant’s phonetic production and the target pronunciation of the stimuli with the alineR package (Downey et al., 2008, 2017) in R (R Core Team, 2021). This measure was the result of feature-weighted linguistic distance calculations. Scores resulted in finite values from 0 to 1, where 0 represented perfect production-target match and 1 represented answers containing sound categories unrelated to the target. We also calculated 2) response latencies across blocks, measured from the object presentation until participants orally responded. Furthermore, based on the values of the first (F1) and second (F2) formants, we computed 3) the Euclidean distance between participants’ vowel productions and the stimulus speaker’s pronunciation of the vowel stimuli (henceforth just vowel stimuli), and 4) the Euclidean distance within participants’ vowel contrast productions (/ʌ-æ/ and /i-ɪ/). In addition, to test the two Euclidean distance variables, formant values of participants’ vowel productions (and vowel stimuli) were normalised using the Lobanov method (Lobanov, 1971). This method uses a log-mean method to normalise the formant values and computes a single grand mean for all participants. This was done to prevent participants’ physiological differences from driving the observed effects.
All incorrect responses or that – despite some similarity with the target – clearly pointed at a distractor were excluded from the analyses of 2), 3), and 4). For example, if a participant said [pi:fəl] for the object associated with the novel word [pi:v], they would have obtained an Aline score of 0.31, but their response was considered incorrect and excluded from analyses of latency and the two Euclidean distances because it pointed at the distractor [bi:fəl]. The excluded trials represented 39.58% of the total responses. A total of 2900 trials were kept for analyses: 1559 in NNDS and 1341 in NDS (BF01 = 1.89, anecdotal evidence for H0).
The dependent variables of the Recognition and Production tasks were independently analysed using growth curve analysis (GCA) models (Mirman, Dixon, et al., 2008; Mirman, Magnuson, et al., 2008) fitted in R (lme4 package; Bates et al., 2015; see Appendix 3 for a list of the models). This technique is explicitly designed to assess changes over time at group and individual levels. GCA allowed us to add to the models the linear and quadratic polynomial terms to account for the overall slope change and the curvature of the observed effects. The linear term reflects the overall slope, and the quadratic term reflects the curvature (i.e., change in slope across learning blocks). Thus, the 6 blocks were added to the model as Block factor, including linear and/or quadratic terms depending on the best model fit. The Register (NNDS and NDS) and Contrast (Single and Goodness contrasts) factors, together with the Block factor, were added to the models as fixed effects (unless otherwise specified). Subject and novel words were included as random effects. Other predictors were considered only if they improved the model fit (see Appendix 3 for a list of the final models). Starting with the minimal structure, various models were created; the final models were chosen according to the best fit indicated by the Performance package in R (Lüdecke et al., 2021). For all models, we set a priori sum contrasts so that within Register, -0.5 was assigned to NDS and + 0.5 to NNDS, whereas within the Contrast factor, -0.5 was assigned to Category Goodness and + 0.5 to Single Category (Schad et al., 2020). Response latencies were transformed using the Box-Cox method (Box & Cox, 1964). Conversely, accuracy of the Recognition task was tested by fitting GCA with generalised linear mixed-effects (glmer) models (binomial family). Given the distribution of the Aline distance, results were fitted in a glmer model of the beta family. Both measures of Euclidean distance were tested in two separate models (one for each contrast: Single and Goodness).
Continuum discrimination. For this task, we used a generalised linear mixed effect model (binomial family) to compare vowel discrimination between the pre-test and the post-test (Exposure factor) and between the two speech register groups. We did not include polynomial terms because GCA did not apply for this variable. Ship/sheep and cup/cap continua were tested in separate models.
For all tasks, model significance was tested with the lmerTest Package (Kuznetsova et al., 2017) and interactions between main effects were explored by running post-hoc analyses in the emmeans package (Lenth et al., 2019) with Tukey HSD correction for multiple comparisons. Given the number of interactions tested in each model, below we report only significant interactions; all results, including non-significant results are reported in the Data Availability.