The study was pre-registered at aspredicted.org (study name: "Sounds_Online", trial identifier: #67702, https://aspredicted.org/d5j7j.pdf) on 06/04/2021. A power calculation was conducted for an interaction effect (repeated measures ANOVA), in G*Power 3.1.9.7 with f = 0.10, α = 0.05, power = .90, 4 groups, correlation between repeated measures r = 0.60, resulting in a minimum required sample size of N = 288.
5.1. Recruitment and In- and Exclusion Criteria
The study was programmed using Inquisit 5 17 [https://www.millisecond.com] and accordingly run on the Millisecond server. Participants were recruited from the crowdsourcing platform Prolific and received 10€ reimbursement for their full participation. Adult individuals were pre-screened on Prolific (i.e., visibility of the study only for candidates with a suited profile) concerning fluent German language skills (as this was the study language), having no diagnosed lifetime mental illness, and having no hearing difficulties. Prescreened individuals could then access the study, where in- and exclusion criteria were checked further. This included no regular substance or drug intake, no suicidal thoughts, or tendencies, and availability of headphones for the purpose of the study.
5.2. Sample
Initially, N = 401 individuals started the survey. Of those, n = 76 quit during the sociodemographic assessment, n = 24 lacked pre-test data, and n = 6 lacked post-test data. These n = 106 cases were excluded from the analyses, resulting in a final sample of N = 295. Of these, some participants had incomplete post-test data (n = 10 missing digit span, 5 missing n-back, and n = 8 missing the qualitative assessments [sound rating]). The participants were in their middle to late twenties on average and there were in tendency more males than females (see Table 2 for details). Positive symptom levels did not differ significantly between the groups.
5.3. Study Procedure
After providing informed consent, sociodemographic information was assessed, including education, income, living arrangement, and further variables. Psychosis liability was assessed. Hereafter, pre-test assessments were conducted, including an assessment of mood (depression, anxiety), paranoia, the digit-span, and n-back tasks. Participants were randomized to one of four sound conditions: 1) low diversity traffic noise soundscape, 2) high diversity traffic noise soundscape, 3) low diversity birdsong soundscape, or 4) high diversity birdsong soundscape (for details on the stimuli, see Stimuli section). The soundscapes each lasted for exactly 6 minutes. Participants were instructed to set their audio system volume to 80% (which was piloted with members of our research unit beforehand and deemed to be an optimal average volume) and to listen to the sounds until the end, when participants were required to continue by clicking with their mouse. To assess participants’ impatience, the number of clicks during the sound presentation were recorded. In addition, participants were told that a code, consisting of two spoken digits (in German), would be audible towards the end of the sound presentation, which they were required to type in correctly afterwards. This was implemented to assure listening-compliance and attention. After the sound presentation, the pre-test measures were repeated. Finally, several items to assess perceived sound quality, including beauty, pleasantness, and monotony (vs. diversity) were presented.
5.4. Measures
5.4.1 Psychosis liability
Psychosis-liability or sub-clinical psychosis levels was assessed using the Community Assessment of Psychic Experiences (CAPE) 18, in its German version, to assesses lifetime positive, negative and depressive symptoms (http://www.cape42.homestead.com/index.html). The CAPE, including the German version, has been validated extensively 11. Items refer to the lifetime prevalence of specific symptoms, rated on an ordinal response scale for frequency (categories: 1 = ‘never’, 2 = ‘sometimes’, 3 = ‘often’, 4 = ‘nearly always’). The total scale consists of 42 items, whereby the positive symptom scale includes 20 (e.g., ‘Do you ever feel as if things in magazines or on TV were written especially for you?’), the negative symptom scale 14 (e.g., ‘Do you ever feel that your mind is empty?’), and the depressive symptom scale 8 items (e.g., ‘Do you ever feel like a failure?’). To test for comparative baseline levels across all groups in psychosis liability, the positive symptom subscale was used 18. It had excellent internal consistency with Cronbach’s α = 90.
5.4.2 Mood and paranoid symptoms
Mood was assessed with the State Trait Anxiety Depression Inventory (STADI) 19. The scale contains 40 items, whereby the same 20 items are once presented in trait and once in state format. Only the latter was used in the present study. The scale differentiates between depression (low euthymia [inverted items], dysthymia) and anxiety (hyperarousal and worry), whereby each of the subscales is assessed by 5 items. The response format is a 4-point Likert (1 = ‘not at all’, 4 = ‘strongly applies’). Internal consistency (Cronbach’s α) at pre-test was good both for the state anxiety (.85) and depression (.86) scales.
Paranoia was assessed with a brief, change sensitive state version of the paranoia checklist. The scale comprises 3 statements (e.g., ‘I need to be on my guard against others’), rated on an 11-point Likert-scale (each from 1-11) for the degree of agreement to the statement, associated distress and conviction. The latter two categories were only presented if the rating of agreement to the statement was > 1 (which accordingly often results in a large amount of missing data). In the present study, only agreement was evaluated. Internal consistency at pre-test was acceptable with Cronbach’s α = .78.
5.4.3 Cognition
To assess digit span cognitive performance, both the forward and backward version were used, as available in Inquisit 5 17 [retrieved from https://www.millisecond.com] which is based on the original task reported by Woods and colleagues 20. Two parameters are recommended for evaluation: the two-error maximum length (TE_ML) and the maximum length recalled (ML). Starting with a successive visual presentation of 3 numbers, the participants need to correctly recall an increasing sequence of numbers and reproduce it by clicking the correct numbers in correct order. After two wrongly recalled sequences of the same length, the task is aborted. The participants were explicitly reminded not to use any memory assisting methods such as paper and pencil.
The dual n-back task, also available in the Inquisit 5 17 [retrieved from https://www.millisecond.com] was assessed. The task is based on the original work by Jaeggi and colleagues 21. It consists of 4 experimental blocks demanding 2-back and 3-back level performance. While performing the task, subjects pay attention to their computer screen while also listening to a computer audio. On each trial a blue square appears in one out of eight grid-like locations around a central fixation cross, while at the same time a (German) letter is presented via the headphones. In the 2-back block condition, the subjects are instructed to press the “A” button on their keyboard when the current square position matches the square position from two trials before. Subjects are also instructed to press the “L” button on their keyboard if the spoken letter matched the letter two trials before. The same instruction, but having to match stimuli 3 trials back, is provided for the 3-back condition. In the present study, participants trained each condition once, and then went on with the experimental blocks. The performance parameter was the so-called d prime value calculated as the proportion of ((visual_TotalHits - visual_TotalFA) + (auditory_TotalHits - auditory_TotalFA)/2) /number of total experimental blocks. Visual hits are defined as correct responses with respect to the location of the square and auditory hits are defined as the correct responses with respect to the spoken letter. False alarms (FA) are defined as responses in the absence of a target.
5.5. Statistical Analyses
Analyses were run in SPSS 27 (IBM Corp., 2020). To test the effects of high vs. low diverse traffic noise vs. birdsong soundscapes on mood, paranoia, and cognition, several repeated measures analyses of variance (ANOVA) were run to test for group*time interaction effects. The analyses were once run with all participants, and then only with those who entered at least one of the digits (control of compliance of listening to audio, see Study Procedure) correctly, to check for the robustness of findings. Significant interactions identified for any of the outcomes were followed up by post-hoc-tests on two levels. First, within sample t-test for the different conditions (i.e., traffic noise low, traffic noise high, birdsong low, birdsong high diversity) were computed. Second, difference scores between diversity categories within conditions were compared (e.g., traffic noise low pre-post difference vs. traffic noise high pre-post difference). To explore mean differences between the qualitative ratings of soundscapes (i.e., beauty, pleasantness, and monotony vs. diversity), a one-way multivariate analysis of variance (MANOVA) was conducted. In case of significant omnibus tests indicating global differences across the qualitative sound rating dimensions, follow-up between group t-tests were conducted. Due to the exploratory nature of the study, no p-level correction was applied.