We employed a simple paradigm to assess how manipulating grouping cues during ‘build-up’ influenced subsequent sound segregation, as assessed by a tone detection task. We found that those grouping cues available during build-up critically shaped subsequent perception. Eliminating predictability during build-up significantly hampered sound segregation, and elevated tone detection thresholds, relative to conditions with temporal predictability in the signal band. In addition, disrupting temporal coherence between signal and flanker bands during the build-up also produced significant increases in thresholds. Thus, these results demonstrate the influence of early sound statistics on subsequent sound perception.
The general pattern of results observed here were consistent with the idea that object formation during build-up is extremely influential to subsequent perception, at least within the relatively recent context. Conditions that discouraged grouping of the signal and flanking bands, i.e. TP+SIGTC− and TP−SIGTC−, produced thresholds comparable to those observed in the narrowband conditions. For example, build-up could be used to block the subsequent use of temporal coherence cues (last paragraph of the results). Previous work demonstrating that surrounding a period of temporal coherence with random temporal structure (similar to TP−TC−) disrupts the benefits of temporal coherence37,38. One could argue that a lack of temporal coherence during build-up (TC−) indicated to the auditory system that the signal and flanking bands should not be grouped (as there is no evidence they should be grouped). However, this is insufficient to explain the results in Experiment 1 where the temporal coherence benefit was comparable for the jittered and predictable comparisons (see Fig. 1C). Here, the lack of interaction demonstrates that bandwidening produced the same effect size regardless of the coupling (TC+)/decoupling (TC−) of channels during build-up. This suggests that the ability to utilize across-channel grouping cues in the masker was not blocked by their absence in the build-up period. In Experiment 2, thresholds were higher for the TP−SIGTC− than for the TP−TC−, suggesting that if an object had been established during the precursor that did not include both signal and flanker bands (through temporal predictability in flanker or signal but not both) then this could block the subsequent use of temporal coherence in the masker. This suggests that object permanence of temporal stability is not overridden by temporal coherence, at least for the limited set of parameters used here. This might be an exception to the temporal coherence model, i.e. “that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source”4, as object permanence can block the use of temporal coherence to form streams.
It should be noted that the masker period had a very short build-up (between the offset of the pre-cursor and the onset of the tone) of 187.5 milliseconds. It may be that the large change in sound statistics is sufficient to “reset” build-up, similar to resetting of stream segregation25,26, and there is time for temporal coherence to build-up and allow the unmasking observed in Experiment 1, with no additional enhancement or interference due to jitter/predictability in the precursor. Previous studies have suggested that unmasking due to precursors is initially a “temporal decline in masking” over the first ~400ms (also observed for sound with no temporal coherence) and then a large threshold decrease (in a subset of participants) after 400ms39,40. If there is a build-up benefit to temporal coherence then there was not sufficient time in this experiment for it to manifest (in just the masker) and, hence, it is unlikely we observed an additional enhancement from build-up in the masker, beyond a temporal decline in masking. We observed that enhancement due to temporal coherence could be blocked by manipulating the precursor, as described previously37–39. However, this is not “true” interference (i.e. a decrease in threshold relative to a neutral condition) but rather a loss of enhancement. More work would be needed to test whether or to what extent interference of temporal coherence is a simultaneous or a sequential phenomenon.
Other work has also demonstrated the type of interference effects observed here. Grose et al.38 presented 4 continuous narrow bands of noise (20 Hz wide centred at 804, 1200, 1747 and 2503Hz) with a temporal structure that was switched from being comodulated (i.e. temporally coherent) to randomly modulated, while trained participants were asked to detect pure tones. The authors found that incoherently modulated precursors/postcursors (both were modified in each trial) produced a small interference with comodulation masking release, elevating thresholds by ~2dB. Subsequent work by, largely, the same authors37 demonstrated a similar effect in highly trained participants (n=4), though with a much larger effect size (~10dB). Similarly, a large interference (9.55dB) was observed in our study when replacing a comodulated build-up with a temporally jittered one (n=19, minimal training provided). Our work, which only looked at the influence of precursor sounds, also demonstrated an interference attributable to disruption of temporal predictability in the build-up (Fig. 1D), suggesting the interference observed by Grohse and colleagues is largely attributable to build-up rather than postcursor interference.
As mentioned already, in Experiment 1 we found no evidence of the build-up period promoting or interfering with the ability to benefit from for temporal coherence in the masker. In addition, in Experiment 2 removing temporal coherence while keeping temporal predictability in the signal band (TP+SIGTC−) did not produce significant interference, whereas, removing temporal predictability from the signal band produced the greatest interference. We did observe that enhancement due to temporal coherence could be blocked by manipulating the precursor, as described previously37–39. However, this is not “true” interference (i.e. a decrease in threshold relative to a neutral condition) but rather a loss of enhancement. This suggests that either interference/enhancement due to build-up has a different time integration window or that temporal predictability is a relatively instantaneous cue. Frequently in the study of temporal coherence, particularly in streaming, the temporally coherent stimuli is also temporally predictable (during build-up and thereafter), meaning it possesses both temporal predictability and temporal coherence. Our data highlight the need to separate these two cues when studying temporal coherence.
In conclusion, this work supports the idea that build-up is a critical process for shaping subsequent perception. Manipulating temporal coherence and predictability during build-up produces independent changes in enhancement and interference of subsequent tone detection (Experiment 1). In addition, our results suggest that temporal predictability influences perception over longer timescale than temporal coherence alone and can produce large interference in subsequent perception. Further work is needed to understand to what extent stimulus history can influence the use of temporal coherence cues.