Based on the top-down model of visual processing, we hypothesized that pre-existing knowledge that an object affords grasping, and is located within a graspable region of space, would lead to a selective bias in low-level visual processing that favours other features highly relevant for action production by the dorsal stream, such as orientation, at the expense of features more relevant for object recognition by the ventral stream, such as saturation. Participants completed a visual change detection task on a monitor located close to or far from their body. The monitor depicted images of individual objects that either strongly afforded grasping or did not. Participants indicated via a key press with their right hand when they perceived a change in the object’s orientation or saturation. The results provide partial support for the hypotheses in that, once stimuli that encouraged bottom-up attentional strategies were at least partially accounted for, participants were more accurate at identifying orientation changes when the object was graspable, but they were faster at identifying changes in saturation when the object was non-graspable. These results lend support to the idea that high-level knowledge of object affordances can selectively bias low-level visual processing in favour of visual features more relevant for either action production by the dorsal stream or object recognition by the ventral stream.
The present study featured several strengths that allowed us to investigate the top-down influence of object affordances on participants’ ability to detect a change in object orientation or saturation. First, the orientation variations used in the present study were designed to address ceiling effects observed in previous research from our lab (Bamford et al., in prep). Bamford et al., included orientation variations where the stimuli were presented upright on a distinct base. The everyday familiarity of these object positions may have enhanced participant detection of orientation changes, as deviations from these “ideal” positions would have been readily apparent. In the present study, none of the orientation variations assumed an obvious position (i.e., 0, 90, 180°) thereby eliminating this ceiling effect in the orientation condition. Second, participants in the present study responded with their right-hand only as early IPS activity and associated top-down processes (Liu et al., 2017), as well as processes involved in the planning and execution of motor actions (Janssen et al., 2010), are known to be lateralized to the left hemisphere, which controls the right hand in right-handed participants. Third, the stimuli used in the present study were matched to control for complex influences of object animacy and real-world size on ventral stream processing (Konkle & Caramazza, 2013). For instance, objects that have a small real-world size may be more readily connected to reach and grasp networks whereas objects that have a large real-world size may be more readily connected to navigational networks in the brain (Konkle & Caramazza, 2013). Thus, failing to control for the real-world size of stimuli could lead to a confound between object graspability and object real-world size. However, this only seems to apply to inanimate objects (Konkle & Caramazza, 2013), or perhaps more accurately, non-motile stimuli (Shatek et al., 2022; Yorek et al., 2009). Thus, to avoid confounds involving real-world size, we chose objects that were all conceptually inanimate (e.g., non-motile) and that were matched according to comparable real-world size. Fourth, the stimuli in the present study consisted of photorealistic images of real objects. Previous research has found that stimuli composed of mostly LSF information, like line drawings (Salmon et al., 2014), tend to bias visual processing toward the magnocellular pathway, while stimuli composed of mostly HSF tend to bias processing towards the parvocellular pathway (Goffaux et al., 2011; Kauffmann et al., 2014). Thus, the stimuli used in the present, which contained a mix of both LSF and HSF features, should have reduced these lower-level bottom-up biases, helping to ensure that any differences in participant responses to the graspable vs. non-graspable stimuli could be more readily attributed to differences in higher-level knowledge of object affordances.
Still, there are some limitations that should be considered. Although all the objects in the present study are common and readily identifiable to most people, we did not directly assess individual participants’ familiarity with the objects nor their confidence in their ability to use the graspable objects. Participants that have never seen or used a graspable object may be unable to access top-down affordances associated with the object, and as a result the brain may potentially bias processing towards the ventral stream (Riener, 2019) instead of the dorsal stream as predicted. Additionally, prior experience actually interacting with physical objects may be especially critical for strengthening object affordances when visual information is relatively unreliable (Takahashi & Watt, 2012). Thus, the fact that we did not prime the participants by having them physically interact with the stimulus objects may have weakened the graspability manipulation in the present study. This may have been further exasperated by the fact that the objects were presented as 2 dimensional (2D) images rather than real tangible objects. Gomez et al. (2017) found that 2D images are less capable of drawing participants’ attention and producing action responses. Thus, future research should aim to increase the power of the graspability manipulation by priming participants to interact with real world objects and using either real, 3D virtual, or 3D photographs of images instead of 2D images.
Despite the fact that the difficulty of the orientation detection task was increased relative to previous research (e.g., Bamford et al., in prep), the initial results of the present study revealed that participants were still consistently faster and more accurate at detecting changes in object orientation compared to saturation, regardless of whether the object was graspable or not. There may be multiple reason for this. First, LSF features that are highly related to pure physical affordances like orientation (Skottun & Skoyles, 2011; Wang et al., 2024), are rapidly transmitted to the extrastriate cortex via the magnocellular pathway, which, as mentioned previously, is thought to contribute to the rapid formation of a gist representation of the object’s shape (Chan et al., 2013; Kveraga et al., 2007; Liu et al., 2017; Trapp & Bar, 2015). It is possible that this gist representation may be sufficient to enable a fast and accurate judgement about a change in the object’s orientation but not saturation. Relatedly, Nilsson (2022) suggests that the evolution of visual processing began with basic “ancient vision” and progressed into more sophisticated “object vision”. Visual processing related to orientation detection encompasses a form of ancient vision where low resolution inputs are sufficient to enable the detection of changes in an object's outline. However, visual processing related to object recognition (e.g., saturation detection) is more optically sophisticated and evolved later. In the same vein, Montare (2016) provides evidence for an evolutionarily ancient primary reaction time system, that is highly attuned and rapidly responsive to objects in motion (which an object that undergoes a change in orientation may be perceived to be), versus a more recently evolved secondary reaction time system, that is slower and more highly attuned to colour changes in motionless objects (like, for example, objects that undergo a change in saturation). Together, these works suggest that the process of natural selection may have favored the earlier and more robust evolution of a visual system highly attuned and responsive to objects in motion. Consequently, all of the objects in the present study, when they underwent a change in orientation, may have been perceived as being in motion and consequently elicited more accurate and rapid responses from evolutionarily ancient visual systems compared to when the same objects simply underwent a change in colour saturation, but did not appear to move.
In contrast to our initial predictions, participants were typically better at detecting changes in both orientation and saturation when the objects were located beyond reach in EP space. These results conflict with previous findings. For example, Kelly and Brockmole (2014) found that participants’ orientation memory improved in PP space, while parvocellular colour memory improved in EP space, suggesting that objects beyond PP space should bias visual processing toward features relevant for perception (e.g., saturation). One reason for this discrepancy may be related to the fact that in the present study, when the object images were presented in EP space, their physical size, relative to the size of the display monitor, was greatly increased to control for retinal image size across the PP and EP space conditions. It may have been the case that in the EP condition participants were better able to compare the orientation and colour saturation of the test images to the black borders surrounding the display monitor. If this were the case, it would indicate that participants were using an unexpected bottom-up attentional strategy that would improve detection of both orientation and saturation changes in EP compared to PP space. Future research utilizing eyetracking could be used to confirm whether or not participants spend more time looking at the display borders in the EP compared to PP condition. Alternatively, Serino et al. (2007) found changes in bimodal receptive field properties following tool use (e.g. cane), whereby the cane extended their action space (PP space). This form of tool embodiment can expand PP space up to 240 cm (Seraglia et al., 2012). In the present study, the use of a wired keyboard that participants were aware they could use to manipulate a monitor located beyond reach may have increased the size of their PP space representation, essentially eliminating the EP space condition completely.
Regardless of object Graspability or Location, participants were typically faster and more accurate at detecting orientation changes, with the exception of the ping-pong paddle for which participants were better at detecting saturation changes (Fig. 7). This was highly surprising as the paddle was arguably the most “graspable” object in the entire stimulus array and was expected to elicit strong bias towards orientation detection. A potential explanation for its exceptionalism could be identified in the participant feedback survey. Multiple participants reported that they used bottom-up strategies to complete the task. This was particularly true for the orientation trials. For example, one participant reported: “orientation tasks were easier if I focused on the straight parts of objects (e.g., the straight blue line in the middle of the paddle’s handle).” The fact that most of the objects contained distinct visual features in their upper regions (e.g., the straight edge atop the birdbath) may have enabled participants to use the bottom-up strategy of focusing on orientation changes in these specific visual features. Notably, attention to specific features in the upper portion of the stimulus would likely enhance performance on the orientation task because when the object rotates clockwise the attended feature appears to move to the right, which is associated with a congruent right-sided key press (Fig. 8, top). As the top of the ping-paddle did not contain any distinct visual features to aid orientation detection, attention to the paddle’s handle in the lower portion of the stimulus, would have produced the opposite outcome; a clockwise rotation in the paddle would cause the attended feature (the handle) to appear to move to the left, which would be associated with an incongruent left-sided key press (Fig. 8, bottom). This likely explains the decreased orientation detection performance for the ping-pong paddle.
Consequently, the original statistical analyses were re-run without the paddle. This revealed that participants were more accurate at detecting orientation changes in graspable objects, but faster at detecting saturation changes in non-graspable objects (Figs. 9 & 10). These results provide partial support for our original hypotheses and align with recent research, which suggests that viewing objects, or images of objects, that afford action biases neural processing towards the magnocellular pathway (Chan et al., 2013; Dubbelde & Shomstein, 2022) and the dorsal stream (Almeida et al., 2014; Noppeney et al., 2006; Rice et al., 2007), and can also modulate visuomotor processing in a top-down manner (Foerster & Goslin, 2021). It also aligns with research indicating that planning and/or execution of specific actions towards a real object can bias visual processing towards other features of that object in visual search tasks (e.g., Bekkering & Neggers, 2002; Wykowska et al., 2009, 2011).
In sum, the present research extends previous work on top-down visual processing by providing behavioural evidence that high-level knowledge of an object’s affordances can bias low-level visual processing in favour of other action-relevant features (e.g., orientation), possibly to facilitate subsequent action production by the dorsal stream, potentially at the expense of less action-relevant features (e.g., saturation) used by the ventral stream for object recognition. However, humans are opportunistic and will adopt a variety of attentional strategies, including those of a bottom-up nature, to successfully complete visual cognition tasks which can substantially obscure experimental results. Thus, future research should consider including manipulations that prime top-down knowledge of object affordances to encourage top-down processing, while fully controlling for lower-level visual features and incorporating catch trials that require participants to attend to the whole stimulus object rather than specific visual features in order to prevent them from adopting bottom-up attentional strategies to complete the task. Furthermore, development of an inventory of the different bottom-up strategies that participants are prone to using in different visual cognition tasks will enable more powerful experimental designs that effectively prevent participants from employing bottom-up strategies while increasing the overall salience of top-down processing.
In an applied sense, many new technologies aim to mimic the human visual system including drone-based delivery services, autonomous vehicles, and facial recognition systems. If these technologies are developed with an over emphasis on the bottom-up model of visual processing it could lead to technologies that have functional flaws with a potential for grave implications. Insight into how top-down processing operates can help to ensure sound technological advancement in machines that must recognize and interact with objects in the same manner that humans do. It may also aid other fields, like clinical psychology and rehabilitative medicine, by helping to inform the development of therapies that aim to restore functional vision in patients with cerebral visual impairment. Additionally, a better understanding of top-down processing and “gist” object recognition could aid researchers in diagnosing and developing treatments for individuals with schizophrenia and other hallucinatory disorders, which may be related to an imbalance in bottom-up and top-down processing (Adamek et al., 2022).