Participants
Thirty adults (10 males, mean age 27 ± 3.72) participated in the overnight experiment. All of them with no-self reported history of neurological, sleep or motor disorders. All participants completed a screen questionnaire before selection, provided written informed consent, and were reimbursed for their time. The experiment was approved by the School of Psychology Ethics Committee at Cardiff University. Participants agreed to abstain from caffeine and alcohol during the study and for 24-hours before. From the thirty participants that completed the task, 10 were eliminated either because of technical problems (n = 3) or because they did not have enough stable SWS (n = 7) to perform the stimulations (we required 12 rounds, these participants were mostly in light sleep and N2 stage). From the 20 participants left, 3 could not finish Session 3 due to the pandemic.
Materials
The behavioral tasks were presented in a quiet room, participants were comfortable seated in front of the computer and stimuli were presented using Matlab©, Psychtoolbox (Brainard 1997) and Cogent 2000 (www.vislab.ucl.ac.uk). Three types of visual stimuli were presented to the participants: female faces (Lundqvist, Flykt, and Ohman 1998), outdoor scenes (taken from the internet) and unusual objects (Horst and Hout 2016), see Fig. 1B. Each stimulus was easily distinguishable from the others within and between categories. All items were presented in grey scale and matched for luminance. Each image was associated with an exclusive sound, semantically congruent with the image as closely as possible (e.g. bike image with a bike bell sound).
Sounds were taken from the internet and truncated into two different lengths, 2s and 200ms, and pitch normalized. We used the longer sounds in behavioral training, facilitating the sound-image encoding, and the shorter version for the rest of the behavioral tasks and TMR cueing. The sounds were played through noise-cancelling headphones (Sony MDR-ZX110NA) during behavioral tasks and through speakers (Dell A225) during sleep. The order of presentation of each stimulus category was counterbalanced across participants and the order of stimuli within each category was completely randomized for each subject. Hence, the experimenter was completely blind to which stimuli form the hierarchy and its order within each condition (Up, Down, Control), and which type of stimuli (faces, objects or scenes) was selected for each condition, so as not to influence the results. Each of the three hierarchies (Fig. 1B,C) comprised 6 images, each one with an associated (highly discriminable) sound. We prepared a set of 12 images and 12 sounds per hierarchy, that is a total of 36 images and sounds. At the beginning of the experiment, for each one of the three hierarchies, 6 of these images with their corresponding sounds were selected to be learned and the remaining 6 sounds were used as controls to be played during the TMR stimulation. Before participants started the first task, the 6 images which would be used during the experiment and the other 6 which would be used as control were randomly selected for each category (the experimenter was blind).
Procedure
Participants arrived at the laboratory around 8 pm and changed into their sleepwear. They reported alertness by completing the KSS (Åkerstedt and Gillberg 1990) and SSS (Hoddes, Dement, and Zarcone 1972) questionnaires. Afterwards, they were fitted for PSG recording and performed the initial training and the immediate test explained in “Experimental tasks” section and Fig. 1A. Participants were ready for bed around 11pm. During the night, the previously learned tones were played softly during SWS. From the three stimulus categories, one was kept as a control (Control) and was not played during the night, allowing us to compare it against the other two which were cued during the Up and Down phases of the SOs respectively. After 7–8 hours of sleep, participants were woken at the agreed time and allowed > 20 min to overcome sleep inertia. During this time, they could go to the toilet, eat, and drink before completing the sleep quality, KSS and SSS questionnaires. Participants then completed the Late test and another Sound-Image association test (“Experimental tasks” section and Fig. 1A). Afterwards, the electrodes were removed, and participants could shower or go home. Finally, participants came back to the laboratory two weeks later (± 2days) to complete the second Late test and Sound-Image association test, identical to the previous one but without EEG recordings, to test the robustness of sleep-TMR mediated benefits.
Experimental tasks by session
The experiment composed three sessions: evening (Session 1), next morning (Session 2) and a follow up session two-weeks after (Session 3). Each of the sessions was divided into different tasks as below:
a) Session 1: Sound-image association learning task: For each of the three categories, participants were shown each of the six items forming the category one by one. At the same time the associated sound (2s length) was played. Each sound-image pair was shown 4 times. The order of items within a category was randomized and the order of the categories themselves was counterbalanced across participants.
Sound-Image association test: Immediately after training, all participants performed a recall session to determine retention level. Three images were presented on the screen while a sound was played. Participants were asked to select as quickly and accurately as possible the image corresponding to the sound using the keyboard arrow keys. When they responded a rectangle surrounded the correct image (green if the participant’s selection was correct or red if it was wrong). Image screen position was randomized on every trial. The three images presented on the screen were pseudo-randomly selected, with the restriction that at least one of the two images was a ‘lure’ of the same category as the right answer. The sounds were cut down to only 200ms long. Participants performed two blocks with three repetitions of each sound per block. At the end of each block, accuracy was presented.
Premise pair learning task: Following previous related experiments (Ellenbogen et al. 2007; Werchan and Gómez 2013), all participants learned five relational premise pairs for each of the three categories. If each category formed a 6-item hierarchy, schematically represented as A > B > C > D > E > F (see Fig. 1C), the premise pairs would be: A > B, B > C, C > D, D > E, E > F. Where the notation "A > B" indicates "select A over B". The pairs were presented one at a time, with images stacked vertically (Fig. 1E). Subjects were instructed to select the item "hiding" a smiley emoticon from the two presented, at first by trial and error, but after practice and feedback they learned which item was correct. If they selected the correct item, it was replaced by a smiley emoticon. This is in line with (Werchan and Gómez 2013), where a smiling-emoticon was used as reinforcement. If they selected the wrong image it was replaced by an angry emoticon. After the feedback, participants received a second reinforcement as the pair was presented again but this time horizontally instead of vertically, and in the correct order (e.g. A-B) from left to right, with the corresponding sounds also played in the correct order. Pairs were organised into blocks of 10 trials for each of the hierarchies. This meant a total of 30 trials per block. Each block presented each of the five pairs of each hierarchy twice, counterbalancing the up-down positions (e.g. A above B and B above A, with A being the correct selection in both cases). The three hierarchies were not mixed within a block. For example, first all pairs for the “scenes” category were presented, then pairs in the “faces” category, and finally the “object” pairs. This order was counterbalanced across participants. Within each category, pairs were ordered pseudo-randomly to avoid explicitly revealing the hierarchy. Hence, a displayed pair cannot contain an item that was in the previous pair (e.g., A > B will never be followed by B > C). Furthermore, the order of the items within the hierarchy was randomly selected for each participant at the start of training, remaining unknown to the experimenter. At the end of each block, the overall performance for that block was shown on the screen to keep participants engaged with the task. All subjects underwent a minimum set of three blocks of training. After the third block, and every block thereafter, only performance of the "middle pairs", meaning B-C, C-D and D-E, was saved to calculate the exit criteria (Werchan and Gómez 2013). If the averaged performance of these pairs for two of the last three blocks was over 66% for one of the hierarchies, the participant stopped receiving feedback for that hierarchy. However, all the premise pairs of this category still appeared on the screen to ensure the same number of trials/appearances for each hierarchy. This continued until the participant reached criteria for all the three hierarchies or a maximum of 10 blocks. In contrast to (Ellenbogen et al. 2007) and (Werchan and Gómez 2013), where the exit criterion was set to 75% accuracy for the middle premise pairs, we used a criterion of 66% to avoid ceiling effects and increase the chances of overnight improvement. On the other hand, we added a more restrictive criterion of 2 blocks out of 3 meeting the threshold, to be sure that the criterion was not achieved by chance. Similarly, to the above-mentioned studies, we only counted the middle premise pairs to evaluate the exit threshold because they are the necessary items for building the inferences.
Immediate test: After criterion was met, participants enjoyed a 5-minute break before proceeding to the immediate test which assessed initial retention of the learned pairs. A similar protocol was used for testing and training with the exception that feedback and sound cues were removed in testing. Subjects were informed that they must select the right item based on previous learning. Participants performed four blocks, with 10 trials per hierarchy. Between blocks, participants solved arithmetic problems to clear short-term memory (von Hecker, Klauer, and Aßfalg 2019). Furthermore, pairs from the different hierarchies were randomly interleaved, always with the restriction of not showing the hierarchy explicitly.
b) Session 2: Late test: After filling the KSS and SSS questionnaires participants performed a similar test as before but this time, they were presented with previously learned premise pairs, new inference pairs, and one ‘anchor pair’ such that a total of 9 pairs were seen instead of just the 5 previously learned. The first new pairs were 3 inference pairs: B > D, B > E and C > E (see Fig. 1D). These pairs are named inference pairs because if you know that B > C and C > D, then you can infer that B > D. The inference pairs can be divided into 1st and 2nd degree of separation. That refers to the number of items between the ‘pair’ items, for instance between B and D is only one item (C), hence it has first degree separation, as does C-E. On the other hand, there are two items between B and E: C and D. Hence the B-E pair has a second degree of separation and is therefore the most distant pair within a 6-item hierarchy. Additionally, we also added a 4th pair, the anchor pair (A > F) as a control since inference is not needed to obtain this relationship. This is due to the fact that A is always correct and F is always incorrect (von Hecker, Klauer, and Aßfalg 2019). Participants were instructed that they might see novel combinations and if that was the case, they should try to make their best guess. At the end of each trial, they were confidence from − 2 (guessing) to + 2 (certain) using the up and down arrows. Following a similar protocol to Session 1, participants performed four blocks with math exercises between them.
Sound-Image association test: After a 5-minute break, subjects performed a new sound-image association test with the same structure as the Session 1’s test but without feedback.
c) Session 3: This used the same tasks in the same order as in Session 2. However, this time participants’ brain activity was not recorded. Finally, the participants performed an Awareness questionnaire (see Fig. <link rid="fig5">1</link>–1).
Closed-loop TMR protocol
The two categories that used for TMR and the control were counterbalanced across all participants (see SM3). The control category was not cued during the night. From the other two, one was assigned to Up and the other to Down. Stimulation started after participants entered stable SWS and was halted for arousals or any other sleep stage. Participants were exposed to an extra control-hierarchy of sounds for each of the two TMR conditions. These control-hierarchy-sounds, also 200ms duration, were completely novel to the participants and were included to allow us to distinguish the TMR effect from a normal brain ERP. Each one of the hierarchies, was composed of 6 items and played in order: A, B..., F. The order of both experimental and control hierarchies and of Up and Down cueing was randomized and counterbalanced across blocks. Each block comprised four hierarchies: Experimental Up, Control Up, experimental Down, control Down (see Fig. 1F). The minimum number of blocks to include a participant in the analysis was 12, this means 288 cues were presented during the night. Online detection of SOs was based on the detection of the negative half-wave peaks of oscillations. The electrode used as reference for the on-line detection was F3 as frontal regions are a predominant SOs area (Massimini et al. 2007), band-pass filtered in the slow-wave range (0.5-4Hz). When the amplitude of the signal passed a threshold of -80uV the auditory stimulus was delivered after a fixed delay of 500ms (H.-V. V. V. Ngo et al. 2013; H. V. V. H.-V. V. Ngo et al. 2015). Inter-trials intervals were set to a minimum of 4 seconds, that is after every sound played there was a minimum pause of 4s. The SO detection, auditory stimulation and presentation of the trigger to the EEG recording was via a custom-made Matlab-based toolbox (https://github.com/mnavarretem).
EEG recordings
Sleep was recorded using standard polysomnography including EEG, electromyographic (EMG) and electrooculography (EOG). EEG was recorded using a 64-channel LiveAmp amplifier (Brain Vision©). Electrode impedance was kept below 10KΩ and sampling rate was 500Hz and referenced to Cpz electrode. In addition to the online identification of sleep stages, polysomnographic recordings were scored offline by 3 independent raters according to the ASSM criteria (Berry et al. 2015), all of them were blind to the periods when the sounds were reactivated.
EEG analysis
Pre-processing and analysis were all performed with Fieldtrip (Oostenveld et al. 2011) and custom Matlab functions. Data were low-pass and high-pass filtered (30Hz and 0.5Hz respectively). Eye and muscle related artefacts were removed using independent component analysis (ICA). Bad channels were interpolated (spline interpolation) and data was re-referenced to linked mastoids. We calculated ERPs by segmenting the cleaned signal into 4 second segments, from − 1s before stimulus onset to 3s afterwards. A final visual inspection of the dataset was performed, and any residual artefact was manually removed. To calculate the differences between stimulation conditions we averaged across all trials, but for the classification analysis we kept the trial information. To study the time frequency evoked TMR response, we calculated the power spectrum of the signal locked to the TMR cue onset using Morlet wavelets from 4 to 20Hz with 0.5Hz resolution and a time window from − 1 to 2.4s in 50ms steps at the subject level. The width of the wavelet was set to at least 4 cycles per time window, adaptatively to the frequency of interest. Resulting TFRs were then expressed as the relative change of baseline from − 1s to 0ms pre stimulus onset.
Statistics
Statistical assessment of EEG data was based on nonparametric cluster permutation tests with the following parameters: 2,000 permutations, two-tailed, cluster threshold of p < 0.05, and a final threshold of p < 0.05 using Fieldtrip toolbox (Oostenveld et al. 2011). The time-frequency statistical analysis was restricted to the post-cue interval (0 to 2.4ms) to avoid the natural differences of the Up and Down phases of the SOs before cueing onset. To examine the accuracy of the TMR protocol (circular statistics) the R package Circular was used (Jammalamadaka, S. Rao and SenGupta 2001). Behavioral analysis was performed using robust statistical methods from the R package WRS2 (Mair and Wilcox 2020) to avoid any possible issues with normality and homoscedasticity assumptions. Repeated measures analysis of variance (RM-ANOVA) or simple 1-way ANOVA analysis was performed accordingly, always keeping the trial information and adding individual differences into the analysis (subjects ID’s). One sample t-test (Students or Wilcoxon signed-rank) tested for difference over chance level (50%) of each group, Condition and Session of interest. Significance of Pearson correlations between classification performance and behavior used a bootstrap method implemented in R, “boot” package (Buckland, Davison, and Hinkley 1998).
Classification
Classification of single-trial data was performed using MVPA-light (Treder 2020) for each participant and each time point (− 1 to 3s) using the sleep-ERP values (filtered between 4 to 20Hz) of the 60 EEG channels as feature. Performance of two classifiers was compared using a linear discriminant analysis (LDA) and a support vector machine with linear kernel (SVM). We used a 5-fold cross-validation method with 2 repetitions and principal component analysis (PCA) to reduce dimensionality (n = 20). The data within each fold was z-scored to avoid bias. Additionally, we used two different metrics to evaluate performance of each classifier: traditional accuracy (ACC), defined as % correct predictions, and area under the curve (AUC), or trade-off between the true positive and false positive rates. Once classifiers were calculated for each participant, we performed a between-subject cluster analysis (Treder 2020) to determine at what time points the Experimental and Control sounds were statistically different for each condition. All code for the analysis of this study is available at https://github.com/Contrerana.