Test-Retest reliability of prepulse inhibition (PPI) and PPI correlation with working memory

Sensorimotor gating – a mechanism to lter sensory input and regulate motor output - is experimentally operationalized by the prepulse inhibition (PPI) of the startle response (SR). Previous studies suggest high test-retest reliability of PPI and potential correlation with working memory (WM). Here we aimed to validate the test-retest reliability of PPI in healthy humans and its correlation with working memory (WM) performance. We applied an acoustic startle PPI paradigm with four different prepulse intensities (64, 68, 72, and 76 dB) and two different WM tasks (n-back, change detection task [CDT]). We were able to conrm high retest-reliability of the PPI with a mean intraclass correlation (ICC) of >0.80 and signicant positive correlation of PPI with n-back but not with CDT performance. Detailed analysis showed that PPI across all prepulse intensities signicantly correlated with both the 2-back and 0-back conditions, suggesting regulation by cross-conditional processes (e.g. attention). However, when removing the 0-back component from the 2-back data, we found a specic and signicant correlation with WM for the 76 dB PPI condition. With the present study we were able to conrm the high test-retest reliability of the PPI in humans and could validate and expand on its correlation with WM performance.


Introduction
The startle response (SR) is an evolutionary conserved re ex occurring across species and has been used to investigate neural substrates of learning and behavior 1 . The blink re ex component of the SR can be easily measured by recording the contraction of the orbicularis oculi in response to an unexpected and intense stimulus (e.g. loud noise). If a weak, non-startling prepulse stimulus precedes the startle stimulus a reduction of the SR is observed; this reduction in SR is called prepulse inhibition (PPI). PPI is thought to re ect a sensorimotor gating process, i.e. irrelevant sensory information is ltered out during early stages of sensory processing 2 . PPI is a well-established high-throughput operational measure with high clinical relevance because it has been found to be reduced in different psychiatric disorders including schizophrenia 3 , post-traumatic stress disorder 4 , Alzheimer's disease 5 , bipolar disorder 6 and comorbid anxiety and depressive disorders 7 . Furthermore, reductions in PPI have been observed in corresponding animal models for psychiatric disorders 8 . Together, these ndings offer the application of translational studies that might generate mechanistic insights into the above-mentioned disorders.
This also includes the relationship between sensory processing re ected by PPI and higher-order cognitive processes especially working memory (WM). WM dysfunction is a central cognitive de cit in a number of psychiatric disorders including schizophrenia and bipolar disorder 9 . Notably, for schizophrenia, there is clear evidence that disturbances in basic sensory processing contribute considerably to WM impairments [10][11][12] . Furthermore, the translational value of both constructs is underscored by their inclusion in the Research Domain Criteria (RDoC) outlined by the National Institute of Mental Health for the purpose of developing a brain systems based psychiatric nosology 13 . Here, PPI is considered part of the "Auditory Perception" construct in the cognitive domain, which also explicitly includes WM 14 .
Existing studies point to the robustness and the moderate to high retest-reliability of the PPI with a mean intraclass correlation (ICC) of 0.68 [15][16][17][18][19][20] and a positive association with cognitive processes 21-24 , including WM processes 23 . Signi cant correlation of PPI and WM has been reported in mice 23,25 (but see 26 ) and rats 27 . For healthy subjects, Bitsios et al. 22 reported correlation of PPI with strategy formation but not simple perceptual processing. Furthermore, Bak et al. 28 found positive correlation between PPI and the CANTAB Spatial WM task performance. Higher attentional capacity was found to be signi cantly associated with better PPI 29 . Other studies failed to nd correlations 30 when considering clinical highrisk groups for psychosis or only report co-occurrence of PPI and WM de cits without correlations of the two variables. While the latter nding is explained by the assumption that cognitive (including WM) and PPI de cits represent two independently in uencing factors at least in the presence of a psychiatric disorder, it is unclear why PPI is correlated with WM in healthy subjects. One hypothesis is that it is not WM capacity per se that is correlated with PPI, but more basal components that are prerequisite for successful WM processes, e.g. vigilance and attentional focus 31,32 . If these basal components are seriously impaired (as in schizophrenia 9 ) higher-order skills that are based on these components can no longer be successfully implemented. In healthy subjects, however, PPI capacities vary within a "normal" range and successful task performance might correlate within this range. If this assumption is valid, one would expect signi cant PPI-WM correlation in healthy subjects, but these correlations are not necessarily speci c for WM and might be mediated by e.g. sustained attention or attention span 32 .
In the present study we aim to assess and replicate high test-retest reliability of the PPI. Moreover, we investigated the association between the most reliable PPI measure and two different WM tasks (n-back and change detection task [CDT]). Due to the nature of the n-back task, we are able to disentangle higherlevel working memory components from lower-level task-speci c components and examine correlations of PPI with both components.

Startle response
Averaged SR intensities pre-, peri-and post-PPI main experiment are shown in Fig. 1

PPI results
Averaged waveforms for blink EMG response to PPI trials separated for session 1 and 2 are shown in Fig. 1 (panel C and D). PPI (in percentage) for each prepulse intensity level and session is presented in

Test-retest reliability
Test-retest reliability of PPI for the single prepulse intensity levels was below 0.80 (see Fig. 2B). For the pooled PPI the ICCs increased to 0.78 and increased further to 0.88 when using the average measure ICC of the pooled.
WM Performance and correlation with PPI WM performance data is shown in Table 1. PPI was signi cantly correlated with n-back task performance (r = 0.54, p < 0.01) but not with CDT performance (see Fig. 2C and 2D). The correlation pattern of the single prepulse intensities and the startle response with n-back parameters are shown in Fig. 2E. While all prepulse intensities were correlated with the 0-back condition, the strongest correlations with the 2-back condition were evident for the higher prepulse intensity levels. Residuals resulting from regressing out 0back from 2-back condition were speci cally correlated with the highest prepulse intensity level. The startle response was not signi cantly correlated with any WM parameter. Performance data of the two working memory tasks showed no signi cant correlation (r = 0.313, p = 0.179).

Discussion
With the present study we aimed to assess test-retest reliability of the PPI in humans and its correlation with WM performance. We found high test-retest reliability (ICC > .80) when we pooled across the different prepulse levels and sessions. Regarding correlations of PPI with WM performance, we found signi cant positive correlation of PPI with n-back task performance but not with CDT performance. Furthermore, signi cant correlation of PPI with n-back was evident for both, the 2-back and the 0-back condition.
Regarding test-retest reliability of startle response and PPI, Both the startle response with N = 10 trials and pooled across sessions (N = 20 trials) was found to be highly reliable (ICC .80 and .89, respectively with < = 52 trials report a mean ICC of .60. In accordance with these ndings, pooling across prepulse levels and sessions leading to 80 trials per subject a test-retest reliability of .84 was quanti ed in the present study. Fine-grained analyses quantifying the test-retest reliability of different prepulse levels with su cient number of trials have to be investigated in future studies. Concerning the correlation of PPI with WM, existing studies reported positive correlations in a range from .24 32 to .64 23 . Our observed correlation of PPI with n-back task performance (r = .52) is in accordance with these ndings. The non-signi cant correlation of PPI with CDT performance is either an indication of speci c WM facets that are correlated with PPI but not present in the CDT or a consequence of a restricted variance in the CDT performance data due to a ceiling effect. A further explanation of these different correlations results from our analysis of control and experimental conditions in the n-back task.
Here we found signi cant correlations with both the 0-back and the 2-back condition. This indicates that it is not speci cally the WM component that correlates with the PPI, but rather basal attentional processes that are present in both conditions. This is in agreement with a nding 32 , which suggests that basic attentional processes mediate the correlation between PPI and WM. However, the question remains why these basic attentional processes, which supposedly also occur during CDT, are not correlated there.
Here again, a restricted variance in the CDT performance data or differences in the task-speci c nature of attentional processes could play a role. These have a clear visual dominance in the CDT, whereas in the nback task, in addition to visuo-spatial processing, verbal rehearsal and more complex motor operations occur. In fact, the n-back task has a considerably greater involvement of executive processes particularly in the 2-back condition 34,35 .
Interestingly, we could show that the highest prepulse intensity level (76 db) showed a) the highest correlation with the 2-back condition and b) was the only condition that showed signi cant correlation with 2-back when 0-back was regressed out (Fig. 1E). The effects of a low prepulse intensity (in our experiment 9 db above background noise) might be explained by reduced conscious detection of these stimuli. Detection of signals has been proposed to be tightly linked with attention and is therefore also in line with a stronger 0-back correlation.
A few limitations related to our study need to be mentioned. To achieve a necessary PPI trial number we had to pool across the different prepulse levels and sessions. Others found correlations of speci c prepulse level PPIs with WM. Because our trial numbers for the separate prepulse conditions were too low to obtain su ciently high test-retest reliability, we were not able to perform such more ne-grained analyses. Furthermore, pooling across sessions may introduce error due to the 4 week time interval between these sessions. Difference in the actual state of the subjects may have reduced the found testretest reliability. However, because we found good reliability, we conclude that advantages from a higher trial number outperformed the error due to state differences. Nevertheless, in future studies the applied paradigm should be modi ed and more number of trials within one prepulse intensity condition must be introduced.
The non-signi cant correlation between PPI and our second WM task (CDT) may be due to a lack of variance in the CDT performance data (see Table 1). As can be seen in Fig. 2D, the performance data was in a rather narrow range and suggestive of a ceiling effect. Therefore, to conclude that the PPI is speci cally correlated with n-back vs. CDT is inadequate. Nominally, the correlation of PPI with CDT was also positive and presumably a larger sample size and a cognitively more demanding increased set size may have yielded signi cant results.
To conclude, we found high retest-reliability of the PPI with a mean ICC of > 0.80 and a positive correlation with WM processes. Although the correlations of PPI and WM were not speci c for WM performance when calculated across all prepulse levels, our analyses indicate that especially the highest PPI level (76 db) is most strongly associated with WM. Overall, our ndings con rm the validity of the PPI construct for translational studies of abnormal sensory information processing in neuropsychiatric disorders. They also emphasize the value of PPI for elucidating the complex relationship between these processes and higher-order cognitive domains of equal translational relevance.

Participants
For the present study, 26 healthy participants were recruited via yers that were distributed at the University Hospital Frankfurt. Participants were included in the study when they met the following inclusion criteria: age between 18 and 55 years, absence of psychiatric or neurological disorders in subjects or their rst-degree family members, right-handedness, no head-related eczema and implants in the head or cranial region, no untreated thyroid dysfunction and no visual and/or hearing impairments. Pregnancy, drug use in the last 48 h, excessive alcohol consumption on the previous day or intake of medications that impair the ability to concentrate led to exclusion. Finally, if German was not the native language, extreme caffeine consumption and unusually short sleep duration also led to exclusion. Three participants were excluded from the analysis after data collection because of uni-or bivariate outlier values in startle or PPI data. After exclusion, 23 participants remained in the analysis; 59.1% women (age in years M = 24; SD = 2.64) and 40.9% men (age in years M = 23.7; SD = 4.03). All participants were university students with a high school diploma as the highest level of education completed.
According to the study protocol approved by the Ethics Committee of the University Hospital Frankfurt (ID = 501/17), all participants had to sign a written consent form after receiving all relevant information in order to participate in the study. The anonymity of the participants was ensured by storing all data pseudonymized, while the personal data of the participants appeared only on the informed consents. Participants were informed that they were free to withdraw from the study at any time without giving reasons and were offered a monetary incentive of €10 per hour, which they received at the end of the second test session. Test-retest interval was M = 27.1 days (SD = 2.28, range: 21-32).

Working Memory assessment
In the n-back task 41 subjects viewed a series of digits (1-4) presented sequentially for 500 ms (interstimulus interval = 1500 ms). One of the numbers in each frame is highlighted and represents the target number to be maintained in memory. As the sequence progresses, the subject must indicate via a button press the highlighted number corresponding either to the currently displayed frame (0-back, control condition) or two frames previously (2-back, experimental condition). The stimuli are presented in a block design; each block lasts 28 s and four blocks are presented for each condition. The conditions are alternated, and the total run length is 4 min 16 s. All subjects practiced the task until they gave 60% correct answers in the 2-back condition.
WM performance is calculated as 2-back performance minus 0-back performance. Furthermore, we calculated 2-back residuals by regression out the 0-back performance from 2-back performance.
In the canonical change detection task (CDT) three red bars with different orientations are displayed on a computer monitor. Subjects are instructed to memorize the exact orientation of these three red bars. After a variable delay (2800-3200ms), subjects again saw three red bars. In 50% of the cases, these correspond exactly to the previously shown bars. In the remaining cases, the orientation of one bar was changed.
Subjects were asked to decide whether the orientation of bars has changed compared to the previously shown bars or not. If no change in the orientation of the three red bars was noticed, the left mouse button should be pressed. If the participants notice that the orientation of one of the three red bars has changed, the right mouse button should be pressed. During the entire duration of the test, a small black cross is visible in the center of the computer screen, which the test participants should xate with their eyes the entire time. Before the start of each new task, the xation cross brie y turns red to announce the impending start of the next task. CDT working memory performance was calculated based on the number of correct answers.

PPI Testing
Acoustic stimulation was delivered binaurally through wireless headphones (Bose® Quiet Comfort® 25). All sound levels were calibrated by using an arti cial ear (Brüel & Kjaer, type 4153). Subjects were continuously presented with 55 dB background noise (broadband white noise). Before the PPI main experiment subjects were presented six startle stimuli (40 ms; 98 dB broadband noise) at intervals of 8-12 s each. In the PPI main experiment, the following stimuli were presented in a pseudo-randomised order (not more than one of the same type consecutively) with a variable interval of 10-20 s: 10 x startle stimulus, 10 x pre-pulse (20 ms broadband noise 64, 68, 72, and 76 dB, i.e. 9, 13, 17, and 21 dB above background) followed by a startle stimulus (100 ms after pre-pulse onset), 10 x pre-pulse alone (76 dB) and 10 x no stimulus (3000 ms). After the main PPI experiment six successive startle stimuli at 8-12 ms intervals were presented. The intensity of the startle response was determined by measuring the electromyogram activity of the orbicularis oculi using two electrodes under the right eye. The test session lasted for 20 min.
Surface EMG was recorded using BrainAmp ExG16 ampli er with a sampling rate of 5000 Hz. The startle response was measured from the orbicularis oculi muscle with two 6mm Ag/AgCL cup electrodes with Ele x paste (Nihon Kohden) placed below the participant's right eye approximately 1 cm under the pupil and 1 cm below the lateral canthus. Relevant skin surface areas were previously treated with a slightly abrasive gel (Nuprep skin preparation gel). All resistances were less than 6 kOhm.
To preprocess and analyze the EMG data, BrainVisionAnalyzer 2.2 Software (Brain Products, 2019) was used. Having collected the raw data for all participants, the rst step in the preprocessing was manual artifact rejection, in which all major artifacts were marked and excluded from the further analysis. To lter the data, Butterworth Zero Phase Filters (low cutoff: 28 Hz, high cutoff: 450 Hz) and a notch lter (50 Hz) were used. The data was then recti ed in order to obtain the same polarity across the dataset, and the 40 Hz lowpass lter was used to smooth the data. The data was segmented per condition, from − 100 to 300 ms. After segmentation, the data was baseline corrected (-100 to -500 ms), in order to make sure that the values are comparable across different participants and conditions.
After preprocessing, trial-wise peak responses were extracted in a time window between stimulus onset to 150 milliseconds after stimulus onset. Trials +/-3 SD of baseline activity were excluded from further analysis. Averages calculated from the remaining trials were exported to IBM SPSS (version 25) for further analyses.

Statistical analysis
Startle responses (without prepulse) were analyzed by a repeated measure analysis of variance (ANOVA) with the within-subject factors session (t1and t2) and position (before, during and after the PPI main experiment). Effects of prepulse intensities were analyzed by a repeated measure analysis of variance (ANOVA) with the within-subject factors session (t1 and t2) and prepulse intensities (64, 68, 72 and 76 db). Alpha was 0.05 and Greenhouse-Geiser correction was used whenever necessary.
Test-retest reliability was assessed using two variants of the ICC, namely ICC(2,1) and ICC(3,1), de ned by where BMS = between-subjects mean square; EMS = error mean square; JMS = session mean square (the original terminology of "J" is "Judge"); k = number of repeated sessions and n = number of subjects.
Thus, in the current study, k = 2 and n = 23.
The calculation of both these variants allowed us to determine the reliability in terms of relative (consistent measures = ICC(3,1)) or absolute agreement (ICC(2,1)). Both forms of the ICC estimate the correlation of the PPI between sessions, modeled by a two-way ANOVA. In the case of ICC(2,1), both effects (subjects and sessions) are assumed to be random, while for ICC(3,1) the effect of sessions is assumed to be xed. Following Fleiss (1986), we denote ICC values < 0.4 as poor, 0.4-0.75 as fair to good and > 0.75 as excellent. For further assessment of reliability we also applied average measure ICC corresponding to an estimate of reliability in case of doubling number of trials 42 .
Correlation analyses were calculated using rank correlation coe cient according to Spearman (two-tailed, alpha = 0.05).

Declarations
Author contributions FF, HA, KF and MMP have written the manuscript. KF has carried out the study. KF and MMP have analyzed the data. RB contributed the CDT and gave signi cant input in writing the manuscript. AR, FF, HA and MMP designed the study and planned its content.

Figure 1
Averaged waveforms for blink EMG response to startle stimuli (1A and 1B) pre-, peri-and post-PPI main experiment and to PPI trials (1C and 1D).