Event counting, such as figuring out how many drops of medicine we are putting in a glass of water or how many times cards of a specific suit appeared during a card game, is a common everyday activity used in many different situations. Logie and Baddeley [1] suggested that event counting engages three types of cognitive processes: firstly, we perceive the item to be counted; then, we have to retrieve a proper counting system from long-term memory; and, finally, the working memory must support the process of maintaining the running total. More specifically, they proposed the phonological loop to be the working memory sub-system responsible for both the process of subvocal articulation of numbers and the maintenance of the running total [2].
There are many situations where we are simultaneously presented with information from multiple sensory modalities. Although a wide range of research has focused on crossmodal interactions across all five senses, at a perceptual, mnemonic, or motor level (see for example [3–7]), to the best of our knowledge, no studies have directly investigated crossmodal interactions in event counting.
The first cognitive process involved in the process of counting, as stated by Logie and Baddeley [1] is the perception of the event or stimulus to be counted. It has been seen that crossmodal congruence or incongruence has a great impact on perception [8]. Many studies have demonstrated the existence of crossmodal interplay between auditory features and visual dimensions, such as spatial elevation [9–14], brightness [15,16] or lightness [16–20], size [11,20,21], angularity of shape [16], and direction of movement [22]. The influence of crossmodal congruence or incongruence on perceptual processes has thus been highlighted by many studies, mainly between visual and auditory dimensions.
As mentioned, the second system involved in the counting processes [1] is memory. In this case as well, the impact of crossmodality seems to be relevant: for example, it has been shown that irrelevant background speech disrupts serial short-term memory for materials presented visually [23]. Other studies have shown the impact of crossmodal effects on working memory performances (e.g. [24]). Interestingly, it seems that when stimuli from different sensory modalities provide additional information, they can facilitate processing [25–29]. However, there are also situations where stimuli presented simultaneously in different modalities can impair performance: When sensory modalities offer conflicting information, stimuli presented in one modality can alter or attenuate processing in the other modality ([30–34]; see also [6]). Many research of this kind pointed out visual dominance ([30,35]; see [36], for a review). For instance, in the Colavita visual dominance task [30], participants were asked to press a button in response to a sound and another one in response to a light. Interestingly, participants often press only the visual button when both stimuli are presented together, suggesting that the visual modality dominates when responding to multisensory information. Other research, on the other hand, show that auditory processing can dominate over visual processing when temporal judgment is required. Shams and colleagues [32] showed that the perception of visual stimuli can be distorted by the simultaneous presentation of sounds. In a classic experiment examining the flash illusion, participants see flashes of light while also hearing sounds presented in fast temporal sequences. When participants are presented with a single flash and two sounds, they tend to perceive more than one flash.
To interpret this asymmetry, it has been suggested that auditory and visual modalities share the same pool of attentional resources and compete for them (see [37–39]). Auditory stimuli are often dynamic and fugacious in nature while visual stimuli are usually presented for longer durations [36]. This account also suggests that attentional resources automatically used by the auditory modality should come with a cost such as mitigating visual processing.
In literature we can find many examples confirming this interpretation: e.g. different task-irrelevant auditory stimuli seem to have detrimental effects on the immediate recall of visually presented materials, by interfering with working memory processes [23,40–44]. Lewis [45] investigated interference effects between auditory and visual stimuli in a visual recall task and found that congruency between the two modalities improved recall, whereas incongruency worsened the performance. The detrimental effect of task-irrelevant auditory stimuli was also corroborated by Robinson and Parker [46]: They presented visual sequences to different spatial locations and participants had to respond by touching each stimulus when it appeared. Results showed that response times to visual stimuli were slower when the visual sequences were presented together with tones, which is consistent with the notion that auditory stimuli may disrupt or delay visual processing. In line with these results, Laughery, Pesina and Robinson [47] examined response times and eye tracking variables while participants completed variations of unimodal and crossmodal serial response time tasks. Participants were shown visual sequences: in one condition, the sequence was presented in silence and in the other condition, the visual sequence was paired with sounds. They found an auditory interference with tones slowing down response times; in other words, participants were slower to fixate on the visual stimuli in the crossmodal condition compared to the unimodal condition. In the cases seen so far, auditory stimuli appear to interfere with visual attention, possibly also affecting eye movements and fixations.
The crossmodal interactions between visual and auditory stimuli are possibly still effective in a continuous process, such as counting. Verbal counting is a well-learned ability; however, it can be difficult and error-prone, especially for large magnitudes: with an increasing number of objects to be counted it is easy to lose track of the running total. Moreover, it has been shown that background task-irrelevant sounds can render the process of counting even more demanding: Both articulatory suppression and irrelevant auditory materials are assumed to interfere with the processes necessary for maintaining information in the phonological loop [1]. Another interesting result has been obtained by Carlyon and colleagues [48]. In this investigation, participants heard a sequence of tones and saw a sequence of visual stimuli; their task was to count, according to the request, either the number of visual targets or the number of auditory targets, either forwards or backward. The results showed a better performance when participants counted the auditory targets than in all the other conditions, confirming that the auditory dominance found in other temporal tasks applies to counting as well. The authors demonstrate that a reduction in performance can be obtained by allocating attention to a concurrent visual task or by having the participants count backward in threes.
In line with these results, Ljung and colleagues [49] also investigated the effects of background speech on counting. In their study authors examined if the meaning of a background speech can affect event counting. In Experiment 2 they asked participants to count dots presented jointly with background speech. They suggested that when background speech is similar in meaning to the focal task process, it contributes to the disruption.
The counting process seems to be affected also by unimodal variables, for example, the so-called fluency, which is the timing regularity of the events to be counted. Carlson and Cassenti [50] examined specifically these effects: In Experiment 1, they assessed self-paced counting with or without delays that disrupted participants’ preferred pacing; in subsequent experiments, participants counted computer-paced events happening at regular or irregular intervals and were then requested to report their total count. In general, results showed that participants counted regular events more precisely. Stevenson and Carlson [51] investigated fluency as well: They asked participants to count asterisks as they appeared on the screen with regular (isochronous) or irregular Stimulus Onset Asynchronies (SOAs). Then, participants were asked to provide a final count and a confidence rating. Their results showed that participants were more accurate and produced greater confidence ratings with regular timings.
Lastly, another variable that may potentially affect counting is speed. Garret [52], in considering the speed-accuracy trade-off, suggested that “Judgement or perception grow in accuracy with the increase in time taken to make it” (p. 1). The first demonstration that accuracy varies according to speed was provided in 1899 by Woodworth [53]. In the case of the counting process, the speed of the stimuli to be counted undoubtedly has an impact on performance. In other words, the faster the stimuli that need to be counted are presented, the more difficult the task becomes and the more likely it is to be error prone. To the extent of our knowledge, no one has explicitly investigated this aspect to date in counting – one of the reasons for such lack of research on this issue is that the effect of speed seems so commonsensical as to deserve little interest. However, we find interesting to include the variable of speed to study its effects in association with other variables that can influence the counting process.
Counting can thus be a task prone to errors due to different crossmodal or unimodal features [47] as well as temporal variables such as regularity [50]. In this study, we focus on the crossmodal effect of simultaneously presented visual and auditory information on covert event counting (that is - to count internally without verbalizing the count). Our interest is to verify if task-irrelevant auditory information can attract exogenously our attention and disrupt the counting process. This is in line with the idea that crossmodal signals can affect sensory processing by directing attention [32].
In general, we also expect that the variables speed (slow or fast presentation of stimuli to be counted) and regularity (SOAs regularity or irregularity, or rhythmicity, see [50]) affect participants’ performance. Specifically, the faster the visual events are presented, the more likely it will be to have errors in counting. In addition, SOAs regularity (isochronicity) should result in a better performance compared to irregular SOAs [50]. Finally, we expect better performance when visual and auditory stimuli are presented synchronously compared to when the presentation is unsynchronized [54].
To summarize, we have considered three different variables that could affect counting processes: 1. Speed - we hypothesize that increasing presentation speed will result in worse performance; 2. Isochronicity - we hypothesize that counting sequences featuring regular timing will be easier than counting events with irregular intervals [50,51]; 3. Synchrony - we hypothesize that the presence of sounds in synchrony with visual stimuli to be counted could facilitate the performance; on the other hand, a condition where sounds are asynchronous with visual stimuli could result in an overcount or an undercount (e.g. [8]). More specifically, regarding this last variable, we want to test two different hypotheses regarding the possible disruption effects of Sinchrony or the lack thereof. These two hypotheses are linked to the ability of sound to exogenously capture our attention, even if task-irrelevant [55–57]. The first hypothesis is that covert counting of visual stimuli can be distorted by the simultaneous presentation of asynchronous sounds that, by exogenously attracting our attention, can trigger the counting process and infiltrate the current count: This infiltration, where irrelevant sounds are able to trigger our counting process more than necessary, would result in an overcount. The second hypothesis, on the other hand, postulates that the task-irrelevant sounds presented asynchronously with the visual stimuli will disrupt the counting process (e.g. [49,58]): This disruption, by temporarily distracting the counting process, would result in missing some of the visual events and eventually in an undercount. Hence, the results can help us to glean a more complete picture on how asynchronous sounds can affect a visual counting task: if the participants tend to overcount, we can assume the first hypothesis (infiltration) to be true, while if they undercount, the second one (disruption) will be confirmed.