Within the visual world, humans are constantly exposed to complex environments composed of a multitude of stimuli requiring varying levels of attention. Given our limited attentional resources, observers rapidly select which stimuli warrant attending. A recent surge in the literature has shown that the allocation of attention within dynamic visual scenes can be driven by factors such as behavioural relevance, scene semantics, or emotional significance (e.g., Beitner et al., 2021, Vater et al., 2022; Yu et al., 2023).
It has been well established that attention can be biased towards individual objects that convey inherent meaning. For example, a face with an angry expression attracts more attention compared to a face with a neutral expression (Okon-Singer et al., 2020). In addition, neutral, arbitrary objects without such inherent property that attract attention can become behaviourally relevant through learning in a specific context. An abstract experimental stimulus (e.g. a black square) that has been previously associated with a positive or negative outcome (e.g., monetary rewards, electric shock) can gain preferential attention in behavioural tasks when compared to objects that are not exposed to such association (Anderson et al., 2021; Failing & Theeuwes, 2018; Grégoire et al., 2021; Le Pelley & Newell, 2023; Schmidt et al., 2015).
Learning about the behavioural relevance of otherwise neutral objects can occur outside the laboratory, such as in naturalistic settings that occur through daily behaviour (Helbing et al., 2020; Kristjánsson & Draschkow 2021). Thus, to better capture realistic stimulus-context learning, researchers have demonstrated that real-world stimuli could elicit attentional guidance patterns that relate to previous experiences by participants in their lifetime. For example, researchers have presented participants with natural three-dimensional (3-D) scenes (e.g., a living room) and observed where participants fixated throughout the scene. Hayes and Henderson (2022) revealed that when freely viewing 3-D scenes, participants’ patterns of fixation were predominantly concentrated in locations of the scene that would be behaviourally relevant (e.g., a staircase) compared to less behaviourally relevant areas (e.g., empty corner of the room).
In experimental settings with more specific task requirements, it has been demonstrated that the learned selectivity of attention could be context-dependent. The most widely studied context is using action to induce a motor-specific context. When anticipating an upcoming action, attention to features relevant to the action is facilitated within intermediate attention tasks. For example, Fagioli and Hommel (2007) demonstrated that while preparing for either a grasping or pointing action, concurrent perceptual task performance was facilitated for target features associated with either action: better target size processing before grasping action and better target location processing before pointing action (also see Feldmann-Wüstefeld & Schubö, 2015; Han et al., 2020; Job et al., 2019).
In Fagioli and Hommel (2007), the action context is provided through overt behaviours (grasping and reaching) required in the experiment. It has also been shown that attention modulation can be achieved by visual stimuli implying motor action despite the possibility of the actual motor act being very low. This is due to internal models of behaviour being intrinsically activated when observing implied actions (e.g., seeing a posture indicative of a movement), oftentimes biasing attention toward locations or features that we expect to be the result of their implied action (Bach & Schenke, 2017; Bach et al., 2014). As an example, explicitly threatening body postures have been shown to reflexively guide attention as participants likely anticipate the outcome of certain postures (e.g., a kick to the left) despite never observing any actions (Azarian et al., 2016; Bannerman et al., 2010; Wang et al., 2018). These results highlight that the accumulation of experiences with explicit actions, regardless of the experimental context or the commitment/observation of any actions, can result in attentional guidance due to implicit behavioural ramifications.
In the studies described above, the modulation of attention was triggered by individual action-related stimuli alone (e.g., a body posture with implied action). In the current study, we intended to explore whether attention could be modulated by a broader task context while the target by itself does not offer motor implications without such context. In other words, the implied action is only present with a combination of the certain behavioural context and key stimulus feature that would otherwise not imply action.
We chose driving as the testing context as driving is an over-learned behaviour and experienced drivers would likely develop an “attention set” that would optimize the preparation for possible motor response in case there is a need (Crundall et al., 2012). In particular, this well-developed attention set would be activated automatically even in a simulated driving task. Moreover, such selectivity would only be apparent in certain contexts and not in other (non-driving) contexts. In addition, it is likely drivers would modulate their allocation of attention well before there is a need for overt motor response (e.g., change direction or speed of driving) even when the probability of the need for this motor action is very low.
In the current study, we used a modified spatial cueing paradigm where a peripheral spatial cue preceded the onset of the roadside target. In a typical version of this paradigm, participants respond to the onset of a target appearing in the periphery visual field following the onset of a spatially uninformative cue in the same or a different location as that of the target. A great number of studies have examined that, compared to a neutral stimulus, how stimulus with negative effects (e.g., angry faces, snakes, threatening words), acting as a cue or target, could influence target detection, localization and discrimination/categorization (e.g., Okon-Singer et al., 2020; Pérez-Dueñas et al., 2014).
Spatial cueing tasks are a more sensitive measure of attention across depth (see Britt & Sun, 2024; Chen & Wyble, 2018). For our task, a neutral cue (a cylinder) was presented before the target presentation in the same or opposite hemifield. In the experimental condition of the current study, licenced drivers drove in a driving simulator and moved forward along a straight road in a 3D virtual reality environment. The target was a pedestrian-like avatar appearing on the roadside some distance ahead of the driver’s viewpoint. We measured drivers’ performance (reaction time) in discriminating a target feature (pedestrian’s arm position) that was not directly relevant to driving.
The most important (task-irrelevant) manipulation in the current study was the orientation of the pedestrian, which could be either facing toward the road (inward) or facing away (outward) from the road. Note that although in the current experiment, the pedestrian never moved, in the real world, with the driver moving along the road combined with the pedestrian’s inward orientation, in theory, there is a possibility of an impending collision between the vehicle and the pedestrian, should the pedestrian step out from the roadside—pedestrians are a common roadside hazard while driving (Song et al., 2023). In contrast, the impending collision would be unlikely when the pedestrian assumed the outward orientation. Therefore, the performance difference when the orientation of the pedestrian was inward vs outward would be the indication of modulation of attention in a context-dependent manner. We anticipated that inward orientation would garner more attention as a safety precaution due to it being perceived as an implicit hazard.
In addition to the experimental condition implemented in each of the three experiments in the current study, we also implemented three sets of control conditions where the target orientation was also varied between inward and outward orientations, but the behavioural context was removed through three different ways: 2-D stationary context with an isolated pedestrian target (Experiment 1), 3-D stationary context with a pedestrian target (Experiment 2), and driving with an inanimate Light-Post target (Experiment 3). See Fig. 1.
The control condition in Experiment 1 intended to isolate whether the difference in visual appearance of the target alone (without 3-D context) between inward and outward orientation would generate a difference in processing speed. The control condition in Experiment 2 intended to test whether attention modulation could still take place with the same 3-D information as that in the experimental condition, except that the motor act of driving is absent. The control condition in Experiment 3 intended to examine whether the forward self-motion alone would lead to attention bias.