General approach
Our goal was to bring together established RT and oculomotor measures of the contextual-cueing effect, which focus on group mean values, with new, oculomotor-scanpath measures that quantitatively describe search behavior in a more fine-grained manner, in particular, at the levels of individual displays or individual participants (see Figure1A and B). Additionally, by replicating established measures from the literature (e.g., Peterson & Kramer, 2001; Tseng & Li, 2004; Brockmole & Henderson, 2006; Myers & Gray, 2010), we aimed to ensure the representativeness of our own data for contextual-cueing studies at large, thus increasing the confidence in the generality of our novel analyses and findings.
For our novel analysis approach to be feasible, we adjusted the experimental design in two respects: First, and motivated by a previous study of contextual cueing (Sewell et al., 2018), we reduced the number of learnable, repeated target locations, as well as the number of target locations in non-learnable, non-repeated displays to four each, with one target location per display quadrant; this was meant to ensure that the memory signals for the respective target location and the corresponding (possibly display-specific) scanpath would have as little interference from other repeated displays as possible and that allocation of attention over space and time would be maximally different. Second, we presented the same repeated and non-repeated display arrangements to all participants, in the same trial order. Using the same display arrangements allowed us to control the perceptual content of the display set throughout the experiment; in particular, using the same arrangements for non-repeated displays ensured a “fair” comparison between scanpaths, eliminating confounds originating from, across participants, variably composed distractor-target configurations in non-repeated displays. Methodologically, these adjustments made it possible to compare pairs of scanpaths at different levels and relating to (1) the similarity of fixation sequences through an individual display when viewed by pairs of different participants and (2) the similarity of scanpaths for an individual participant viewing (pairs of) different displays. These design measures enabled us to perform a thorough test of the contrasting predictions made by the specific and the generic procedural-optimization accounts.
Participants
The sample size was determined based on previous CC studies using relatively large sample sizes (e.g., Vadillo, Malejka, Lee, Dienes & Shanks, 2021; Peterson, Mead, Kelly, Esser-Adomako & Blumberg, 2022) in order to obtain reasonably stable estimates of contextual cueing. High statistical power was necessary since we analyzed contextual cueing at the level of individual displays. We recruited 46 participants for the present experiment (38 females, 3 left-handed, mean age = 23.28 [SD = 5.62, range = 19–43] years).
Apparatus and stimuli
The experimental routine was programmed in Matlab with Psychtoolbox extensions (Brainard, 1997; Pelli, 1997) and run on an Intel PC under the Windows-7 operating system. Participants were seated in a dimly lit laboratory booth in front of a 19-inch CRT monitor (AOC, Amsterdam; display resolution 1024 x 768 pixels; refresh rate: 85 Hz) at a viewing distance of 60 cm (controlled by a chin rest). The search displays consisted of 12 gray items (luminance: 1.0 cd/m2; 1 target and 11 distractors) presented against a black background (0.11 cd/m2). All stimuli extended 0.35° of visual angle in both width and height. As depicted in Figure 1C, the items were arranged on three (invisible) concentric circles around the display center (with radii of 1.74°, 3.48°, and 5.22° for circles 1, 2, and 3, respectively). In repeated displays, the locations and orientations of the distractors were held constant across trials; in non-repeated displays, all distractors (i.e., their locations and orientations) were generated anew on each trial. Note that in all presented displays, the location of the target was repeated but the (left/right) orientation of the target was determined randomly and was, thus, unpredictable. As a result, a repeated context could only be associated with a specific (repeated) target location, but not with a specific target identity. Following Chun and Jiang (1998), this approach is used in most CC studies to ensure that contextual facilitation of RTs is owing to the repeated context guiding attention/ the eyes, rather than facilitating the selection of the manual response (invariably) associated with a given repeated display. Importantly, both the set of (N=4) repeated displays and the set of (N=128) randomly generated non-repeated displays were kept constant across all 46 participants, so that each participant encountered identical repeated and non-repeated configurations. Note, however, that trial order was randomly chosen within each block of N=4 repeated plus N=4 non-repeated trials for individual participants. This enabled us to keep low-level, individual display properties constant across participants and thus compute dependent – scanpath – measures for each individual display (with variations between participants providing the error term).
There were overall 8 possible target locations, 4 of which were used for repeated displays (with constant distractor layouts) and the other 4 for non-repeated displays (with random distractor arrangements). All targets, in both repeated and non-repeated displays, were located on the second ring, controlling for the distance of the target from to the display center in all conditions. Furthermore, the targets were placed in all four quadrants with equal probability. Importantly, participants were not informed about the fact that some of the search arrays were presented repeatedly. The “T” target was rotated randomly by 90° to either the left or the right. The 11 remaining items were L-shaped distractors rotated randomly at orthogonal orientations (0°, 90°, 180°, or 270°). Figure 1C presents example display layouts (an illustration can be seen in Appendix III, Figure1).
To monitor and record eye movements, a video-based eye-tracker was used (EyeLink 1000; SR Research Ltd., Mississauga, Ontario, Canada; version 4.594). Eye-movement recordings were calibrated at the start of the experiment and after every 64 trials. Calibration was considered accurate when fixation positions fell within ~1° for all calibration points. The default psychophysical sample configuration of the eye-tracking system (i.e., saccade velocity threshold set at 35°/s, saccade acceleration threshold set at 9500°/s2) was adopted for identifying saccadic eye movements.
Trial sequence
A trial started with the presentation of a central fixation cross (0.10° x 0.10°, luminance: 1.0 cd/m2) for 500 ms. Next, the fixation cross was removed from the screen, and, following a blank interval of 200 ms, the search display was presented. Observers were instructed to find the target “T” and respond as quickly and accurately as possible to its (left vs. right) orientation, while being allowed to move their eyes freely. Each search display stayed on the screen until a manual response was elicited. If the “T” was rotated to the right (left), observers responded by pressing the right (left) arrow button on a computer keyboard with their right (left) index finger. Following a response error, the word “Wrong” appeared in the screen center for 1000 ms. Each trial was followed by a blank inter-trial interval of 1000 ms. The experiment consisted of 256 trials (32 blocks x 8 trials each, 50% repeated displays in each block). Participants were free to proceed to the next block at their own pace. The search task took some 30 minutes to complete.
Recognition test
At the end of the experiment, observers performed a yes/no (repeated/non-repeated display) recognition test, permitting us to assess whether they had acquired any explicit memory of repeated configurations presented in the preceding search task (a standard procedure in contextual-cueing experiments; see, e.g., Chun & Jiang, 1998). To this end, observers were presented with 4 repeated displays and 4 newly composed displays. The task was to indicate whether or not a given display had been shown previously, by pressing the left or the right mouse button, respectively. The 4 repeated and the 4 newly generated displays were presented in random order. Observers’ responses in the recognition task were non-speeded and no error feedback was provided.
Statistical analysis
Comparisons of scanpaths were carried out in Python (Van Rossum & Drake, 1995). Statistical analysis was performed using R (version 3.4.3; R Core Team, 2018). Because the same set of non-/repeated displays was used across participants with evenly distributed target positions across the four display quadrants, we analyzed our dependent measures (reaction times, error rates, oculomotor variables) using the lme4 package in R for mixed-effect modeling, to account for these dependencies by including target quadrant and participant as random factors in addition to the fixed factors of block and context.
Dependent measures
To establish comparability between the present investigation and previous contextual-cueing studies, as well as to validate our dataset as being representative for visual-search paradigms, we begin our analysis with an examination and replication of established measures of the contextual-facilitation effect: RT, fixation number, and saccade amplitude (see, e.g., Tseng & Li, 2004, or Peterson & Kramer, 2001). We then proceed to the presentation of scanpath measures of contextual facilitation, including the total length of the scanpaths (Brockmole & Henderson, 2006; Pollmann & Manginelli, 2009) and the standard deviation of the lengths of the saccades constituting each scanpath. The latter essentially provides a new measure of the variability of saccade lengths across individual observers and displays, where a decrease in variability can be considered a measure of automaticity (Logan, 1988). This is followed by new overlay-plot visualizations of individual participants’ spatiotemporal sequence of oculomotor behavior (which are also meant to demonstrate the usefulness of our scanpath approach to eye-tracking investigations of visual search in general). From these visualizations, a new quantitative measure of contextual cueing, namely: scanpath similarity or consistency, is derived.
Reaction times.
For the RT analyses, error trials and ‘extreme’ RTs three standard deviations below and above the mean were excluded from the data. This outlier criterion led to the removal of ∼3% of all trials. Individual observers’ mean RTs, and associated error rates, were calculated per experimental condition and submitted to a 2 × 32 repeated-measures linear model analysis of variance (ANOVA) with Context (repeated, non-repeated) and Block (1-32) as fixed factors and Participant (1-46) and Target Quadrant (upper-left, upper-right, lower-left, lower-right) as random factors.
Analysis of scanpath similarity
The similarity of scanpaths was computed using established measures in the field (cf. Fahimi & Bruce, 2020; Jekel et al., 2019), in particular, Dynamic Time Warping, Discrete Fréchet Distance, and Area Between Curves.
Dynamic Time Warping is a measure of similarity between two fixational series of different lengths. Two individual scanpaths may be highly similar with regard to the placing (i.e., the spatial coordinates) of individual fixations, but the temporal alignment of these sequences may be less consistent across individual trials. The strength of Dynamic Time Warping is that it can quantify the similarity of the shapes of scanpaths with distinct time series. Specifically, this metric compares two fixational series by aligning them in the time domain, thus minimizing the Euclidean distance between the aligned series. Concerning the Discrete Fréchet Distance: this metric can also deal with fixational time series of different length (and tempo). The Fréchet Distance considers both the location and ordering of the individual fixations along two scanpath curves and can be defined with regard to an analogy: a person that is walking a dog on a leash, with the person walking on one (scan-) path/curve and the dog on another path/curve. The discrete version of the Fréchet Distance only compares distances between fixations and not points in between. Hence, the Discrete Fréchet Distance corresponds to the length of the shortest leash possible for traversing both curves. Finally, we computed scanpath similarity based on Area Between Curves, which, like Dynamic Time Warping and the Discrete Fréchet Distance, permits comparisons of scanpaths with different lengths, although the particular scanpath measure is based on the area that falls between two scanpath curves. As Area Between Curves is well-suited to quantify hysteresis (Jekel et al., 2019), this measure should be particularly sensitive to capture scanpath similarity when trajectories have the same start and end points (initial fixation point and target location).
We chose to explore three scanpath metrics, rather than just one, in order to to provide a maximally precise and unbiased measurement of the effects of search task training on oculomotor behavior (assuming that not all scanpath metrics capture the underlying effects equally; cf. Fahimi & Bruce, 2020). Our specific trial schedule (see Figure 1A and B) allowed us to examine the similarity of scanpaths in multiple ways (for further details, see Appendix I). First, we compared each possible pair of gaze patterns arising from identical displays over different participants. This approach enabled us to compute scanpath similarity for each experimental block (each consisting of 4 repeated and 4 non-repeated arrays), thereby addressing the important question of how the consistency of viewing patterns changes as a function of practice on the task. Second, we computed similarity of oculomotor trajectories between each pair of different displays viewed by the same participant. This analysis was intended as a strong test of the display-specific vs. general-procedural accounts of contextual repetitions on search-task training.
To more formally examine whether these observations represent meaningful effects, we computed scanpath similarity for each experimental block (1-32). To recap our hypothesis: if participants are learning a generic search procedure that is increasingly effective, then similarity of scanpaths should increase over time for both repeated and non-repeated arrays, though this effect should be more pronounced for the former displays which, due to being repeated, accrue a greater weight in shaping the generic search procedure. But the prediction would be fundamentally different for the display-specific hypothesis of contextual cueing, according to which experience with individual repeated displays leads to the build-up of display-specific memories and associated scanning behavior. Accordingly, scanpath similarity obtained for pairs of individual repeated displays with different spatial composition should decrease with increased search-task training (and similarity measures should effectively be lower than those for non-repeated displays). Two analyses were conducted (see Figure 1A and B; also Appendix I). In the first, within-display analysis, similarity of eye-movement sequences was calculated from each pair of different participants when viewing a given, individual (repeated or non-repeated) display. Second, in the within-participant analysis, similarity measures were generated from eye-movement sequences in pairs of different displays when searched by an individual participant. Both analyses were conducted for three similarity measures: Dynamic Time Warping, Discrete Fréchet Distance, and Area Between Curves. Statistical inference was based on mixed-effect ANOVAs with the fixed factors of Context and Block and the random factors of Target Quadrant (in the analysis of within-display similarity) and Participant (within-participant scanpath analysis).