Multi-level Crowding: Evidence from Global and Local Misreports

Crowding refers to the inability to recognize objects in clutter, setting a fundamental limit on object recognition. Here, we investigated the processing level at which crowding occurs by exploring the type of crowding errors (global, local, or both). Twenty-seven observers estimated the orientation of a target when presented alone or surrounded by ankers (local shapes). Flankers were aligned to create an illusory rectangle (enhanced global conguration) or misaligned (reduced global conguration). We analyzed the error distributions by tting probabilistic mixture models. Results showed that often participants misreported the orientation of a anker instead of that of the target. Interestingly, in some trials the orientation of the global conguration was misreported. These results suggest that crowding occurs simultaneously across multiple levels of visual processing and crucially depends on the spatial conguration of the stimulus. Thus, crowding might be characterized as a bottleneck of visual object identication at different levels of representation.

be irretrievably lost (Freeman & Simoncelli, 2011;Herzog, Sayim, Chicherov, & Manassi 2015). If crowding occurs due to the averaging of low-level visual features between target and ankers at an early visual stage, it would disrupt the formation of any higher-level target representation, and crowding would represent an early bottleneck of human vision.
However, in contrast to the prediction of a strict "pooling" or averaging mechanism, crowding errors often re ect reports of a anker instead of the target -known as misreport errors (Ester,  for a review). For example, Kimchi and Pirkner (2015) examined whether crowding could occur at the object con gural level in addition to feature-or part-level crowding. Results showed that crowding was weaker when the ankers were similar to the target's local parts than when the ankers were similar to the target whole con guration. Herzog et al. (2015) showed that the same vertical ankers lose their crowding strength when becoming part of rectangles or good Gestalts. Interestingly, a recent study by Doerig, Bornet, Rosenholtz, Francis, Clarke and Herzog (2019) showed that crowding models that incorporate a grouping component strongly improve model performance. These ndings suggest that crowding can occur at the global or con gural levels and that grouping processes precede crowding. However, it is still unknown whether and how crowding processes operate at the local and global levels of stimulus representation.
In this study, we addressed this question by exploring whether and how misreport errors depend on the level of visual processing. Observers performed an orientation estimation task of a target (a black rectangle with two triangular cut-outs) when presented alone or surrounded by ankers in two conditions ( Figure 1). In one condition, the ankers were aligned to create a coherent illusory rectangular con guration. Illusory -or Kanizsa-shapes (Kanizsa, 1979) refer to the perception of a shape de ned by sharp illusory contours (see Spillmann & Dresp, 1995, for a review). In the second condition the ankers were misaligned, so no illusory shape was formed, yet inducers might be grouped according to Gestalt principles (e.g., Kimchi, 1998) forming a global shape, presumably weaker than the illusory one.
For each trial, we calculated the estimation error for orientation by subtracting the true value of the target from the estimation value and analyzed the error distributions by tting probabilistic mixture models. If crowding occurs only at a lower level of visual processing, we expect crowding to be induced by local shapes, but not by global shapes. Namely, observers will misreport the orientation of a anker, but not the orientation of the global shape formed by the ankers, as the target orientation. If crowding occurs also at a higher level of visual processing, we expect crowding to be induced by local shapes, but also by global shapes (i.e., observers will misreport local as well as global orientations as the target orientation).
Manipulating the strength of the global shape (illusory vs. grouped) would allow us to examine how different perceptual organization processes modulate crowding.

Observers
Twenty-seven graduate and undergraduate students (17 females: age range = 18 -39 years, M = 28.23, SD = 6.07) from The University of Haifa participated in the experiment for either course credit or monetary payment (40 ILS per hour, around $12). The sample size was calculated on the basis of an a priori power analysis to detect a crowding effect with 80% power and a moderate effect size (0.5), given a .05 signi cance criterion. All observers had normal or corrected-to-normal visual acuity and normal color vision. None of them reported either attention de cits or epilepsy. A written informed consent was signed by the participants before the experiment. The method was carried out in accordance with the Declaration of Helsinki and was approved by the Human Ethics Committee of the University of Haifa.

Apparatus
The stimuli were displayed on a gamma-corrected 21-in CRT monitor (SGI, with 1280 × 960 resolution and 85-Hz refresh rate) connected to an iMac and were programmed in Matlab (The MathWorks, Inc., Natick, MA) using the Psychophysics Toolbox extensions (Kleiner, Brainard, & Pelli, 2007). An Eyelink 1000 Plus (SR Research, Ottawa, ON, Canada) system was used to monitor eye movements, and viewing distance was set to 57 cm using a chin rest. Observers used the mouse to report their responses.

Stimuli
The stimuli were presented on a grey background with a luminance level of 56 cd/m 2 . The xation display consisted of black cross subtending 0.3° at the center of the screen. The target display consisted of the xation cross along with the target; a rectangular black (0 cd/m 2 ) shape subtending 1.1° in height and 0.9° in width, presented on the horizontal meridian, either on the left or on the right hemi eld, and with 9° eccentricity. A triangular shape (0.1°) was cut out from the two sides of the rectangle (Fig. 1A).
The target could appear alone (uncrowded condition) or surrounded by four ankers in two different anker con gurations: ankers aligned and ankers misaligned conditions. The anker stimuli were identical to the target stimulus. The center-to-center target-anker spacing was 1.8º.
Target and anker orientation were selected randomly from a circular parameter space, which consisted of 180 values evenly distributed between 1° and 180°. In both the ankers aligned and ankers misaligned conditions, diagonal ankers had always the same orientation, therefore ankers were always rotated in pairs. In the ankers aligned condition, ankers' triangular corners were aligned so they created an illusory rectangle (2° in height and 3° in width, see Fig. 1A). Figure 1B illustrates the trial sequence. We instructed observers to xate on the xation cross during the trial presentations. Each trial began with the presentation of the xation display for 500 ms, which continued until the observer xated consecutively in the xation cross for 300 ms. Following observer's xation, the target display appeared for 200 ms. After the stimulus display, a blank screen was presented for 500ms, which was followed by a response screen. During the response display, observers estimated the target orientation by selecting a position for the target stimulus on an orientation wheel (see Fig. 1B).

Procedure and design
Following observer's response, a blank inter-trial interval (ITI) appeared for 500ms.
Observers were presented with 200 trials for each of the three display conditions (uncrowded, ankers aligned and ankers misaligned), for a total of 600 trials per session. The experiment was divided in 10 blocks of 60 trials and lasted approximately 60 minutes. Display conditions were randomly mixed within each block. Observers were advised to take short rests between blocks.
In each trial, eye xation was monitored using an eye tracker (see Apparatus). Trials in which xation was broken ( xation window was < 2° for twenty subjects, and < 3° for seven subjects) were eliminated from the data and rerun at the end of the block.

Models and analyses
The estimation error for orientation reports was calculated by subtracting the true value of the target from the estimation value at each trial, such that zero indicated the target value. We calculated ankers' values by subtracting the true value of the target from the values of the ankers in order to assess the contribution of the ankers to the error distribution. The error distributions were analyzed by tting probabilistic mixture models, developed from both the standard model and the standard with misreport model (Bays, Catalao, & Husain, 2009). We compared three different models: The Standard Mixture Model (Equation 1) with two components: a von Mises (circular) distribution that describes the probability density of reports around the target's orientation, and a uniform distribution that describes the probability of reports that are unrelated to the target (guessing rate). Thus, the model includes two free parameters (γ, σ): being θ the value of the estimation error and γ the proportion of trials in which observers are randomly guessing (guessing rate). f(θ) σ is the von Mises distribution with a standard deviation of σ (variability; the mean was set to zero), and n the total number of possible values for the target's orientation (in our case, 180).

The Local model (Equation 2) includes three free parameters. The model adds a misreporting component
to the standard mixture model, which describes the probability of reporting the orientation of any of the four local-ankers to be the target: where β the probability of reporting a anker orientation as the target orientation, m represents the total number of nontarget items (four in the present study), and θ * i is the error to the feature of the ith anker. Notice that in this model, the von Mises distribution of the estimation errors [f (θ)], describes the distribution when the observer correctly estimated the target's feature; thus, its mean is zero. For ƒ(θ * i ), which represents the distribution of estimating one anker, the mean would be the orientation distance of the corresponding anker to that of the target. Here, the variability of the distributions for each stimulus is assumed to be the same.

Results
For each observer in each condition, we examined the bias of the errors by calculating the mean error.
Mean error was close to zero in the uncrowded condition (M = -0.78, SD = 1.57), the ankers aligned (M = 0.77, SD = 3.14) and ankers misaligned (M = 1.12, SD = 3.68) conditions. We then calculated precision as the inverse of the variance of the errors for each observer in each condition ( Fig. 2A). We conducted a ( ) ( ) one-way Analysis of Variance (ANOVA) on precision with display conditions (uncrowded, ankers aligned, ankers misaligned) as a within subject factor. A main effect on precision was observed, F(2,52) = 130.59, p < 0.001, p 2 = 0.83, indicating signi cant differences in precision between the three display conditions. Pairwise comparisons (Bonferroni corrected) showed that precision was signi cantly higher when the target was presented uncrowded compared to when it was presented surrounded by ankers (aligned: p < 0.001; misaligned: p < 0.001). On the other hand, no differences in precision were found between the two anker conditions (p = .091).

Probabilistic models
In the uncrowded condition, the standard mixture model described accurately the distribution of the errors (Fig. 2C). For each anker condition we compared the two relevant models (e.g., the Local model and the Global-Local model) by calculating the Akaike information criterion with correction (AICc) for each observer. Figure 2b shows the mean AICc differences between the relevant models in each anker condition. The Global-Local model outperformed (i.e., lower AICc value) the Local model in both the ankers aligned and ankers misaligned conditions [t(26) = 2.28, p = .031, Cohen's d = 0.44; t(26) = 3.02, p = .006, Cohen's d = 0.58, respectively]. Thus, we analyzed the tted parameters of the best performing models (i.e., the standard mixture for the uncrowded display, and the Global-Local model for the two types of crowded displays).
We calculated target reporting rate (P T ) by subtracting the accumulative guessing rate and misreport rate from 1 (i.e., P T = 1 -γ for the standard mixture, P T = 1-γ -β global -β local for the Global-Local model) for each tted model. Figure 3 depicts the mean guessing rate (γ), variability (σ) and target reporting rate (P T ) of the tted models in each condition. To assess the effect of crowding on performance, we conducted one-way, repeated measure ANOVAs on guessing rate, variability and target reporting rate as dependent variables, with display condition as a within subject factor. For guessing rate, a main effect of display condition [F(2,38) = 4.99, p = 0.019, p 2 = 0.16] suggested differences between conditions (note that Greenhouse-Geisser corrections are applied on p-values and degrees of freedom when the sphericity assumption is violated). Pairwise comparisons showed that the guessing rate was signi cantly lower when the target was presented alone (uncrowded), than when it was anked (aligned: p = .015; misaligned: p = .024), yet no differences were found between the two anker conditions (p = 1.000). For variability, a main effect of crowding [F(2,42) = 34.11, p < 0.001, p 2 = 0.57] also suggested differences between conditions. Similar to the pattern for guessing rate, variability in the orientation reports was signi cantly lower when the target was presented alone than when it was anked (aligned: p < 0.001; misaligned: p < 0.001), yet no differences were found between anker conditions (p = .350). For target reporting rate, a signi cant main effect was found [F(2,52) = 34.11, p < 0.001, p 2 = 0.57], suggesting that target reporting rate signi cantly differed between conditions. Here, pairwise comparisons revealed signi cant differences between the three conditions: the probability of reporting the target orientation when the target was presented alone was signi cantly higher than when presented with ankers (aligned: p < 0.001; misaligned: p < 0.001). In addition, the probability of reporting the target when the ankers were aligned was signi cantly higher compared to the condition where the ankers were misaligned (p = .012).
Finally, we analyzed the misreport rates to the global and local orientations. Figure 4 depicts the probability of misreporting the global (i.e., either the illusory rectangle or the perceived global shape of the four ankers) or one of the local (i.e., one of the anker's) orientation in the two anker conditions.
The probability of misreporting either a global or a local orientation was signi cantly different from 0 in both anker conditions, as assessed by independent t-comparisons (all ps < 0.001), showing that participants misreported both the global and the local orientations as target orientation in the two anker conditions. In addition, results revealed no differences in the probability of misreporting the global shape between anker conditions [t(26) = 0.37, p = .715], but the probability of misreporting a local anker orientation was signi cantly higher when the ankers were misaligned [t(26) = -2.94, p = .007, Cohen's d = -0.56]. As a control, we explored the probability of misreporting the perpendicular (+ 90°) global orientation, which should have a value of 0 since this is the most distant orientation from the actual orientation of the global shape. Indeed, independent t-comparisons showed that misreporting the perpendicular global orientation was close to 0 in each anker condition, mean misreport rate of the global perpendicular orientation being 0.002 and 0.009 in the ankers aligned and ankers misaligned, respectively.

Discussion
In the present study, we investigated the processing level at which crowding occurs by using crowded displays in which the ankers (local shapes) could form a global shape (with or without illusory contours) and exploring the extent to which crowding errors depend on these global-local levels. Our results show that observers misreported the orientation of both global and local shapes as the orientation of the local target, suggesting that crowding operates at various levels of visual processing. Observers were more precise and reported the target orientation more accurately when the target was uncrowded than when it was surrounded by ankers; also, the error variability and guessing rate were signi cantly lower in the uncrowded condition. Interestingly, the probability of reporting the target when the ankers were aligned was signi cantly higher compared to the ankers misaligned condition.
Importantly, we analyzed the misreport rates of the global and local orientations. If crowding occurs at a lower level of visual processing, we expected observers to misreport the orientation of a anker (local orientation), but not the orientation of the global shape (illusory or grouped) formed by the ankers (global orientation), as the target orientation. Yet, if crowding occurs at a higher level of visual processing, we expected observers to misreport local as well as global orientations as the target orientation. The latter case is what we found. Model comparison showed that a model with both global and local misreports produced a better t than a model with only local misreports. Furthermore, model tting revealed that observers signi cantly misreported both the global and local orientations as target orientation, showing that the global orientation of the anker con guration is preserved in crowded displays, and suggesting that crowding may occur also at a higher level of visual processing between global and local stimulus representations.
The manipulation of the alignment of the ankers produced some interesting results. Even though the global orientation was equally misreported when the ankers where either aligned (enhanced global con guration) or misaligned (reduced global con guration), the probability of misreporting a local anker orientation was signi cantly higher when the ankers were misaligned. This, together with the nding that the alignment of the ankers produced higher target reporting rates (P T ), suggests that the perception of the illusory shape modulated crowding effects, in line with previous ndings showing that ankers might lose their crowding strength when becoming part of rectangles or good Gestalts (Herzog et al., 2015). Note that this modulation, however, is not manifested by a higher probability of misreporting the orientation of the illusory rectangular orientation, but rather, by a lower probability of misreporting anker orientations as target orientation, indicating a reduction in the strength of crowding.
The nding that both the global orientation of the grouped ankers ( ankers misaligned condition) and the orientation of the illusory rectangle ( ankers aligned condition) produced similar misreport rates might be explained by recent evidence related to the processing of illusory shapes under restrictive visual conditions or when presented in peripheral vision. First, disentangling the perception of the illusory shape from the grouping of the local ( anker) elements is not easy when these con gurations are presented under challenging visual conditions. This issue was explicitly studied by Jimenez and Montoro (2018), who displayed masked illusory (black "pacmen" and semicircles arranged in such a way that they produced horizontal or vertical illusory bars) and grouping (the same pacmen and semicircles rotated in a way that they did not produce illusory gures) primes that could be congruent or incongruent in their orientation with subsequent probe stimuli (vertical vs. horizontal bars). The authors found signi cant priming effects for illusory and grouping primes at different prime durations but, crucially, the magnitude of the priming effect was equal for the illusory and grouped primes. Thus, the dissociation of the grouped percept from that generated by the illusory shape is not accomplished easily under restrictive visual conditions. Second, it seems that retinal eccentricity plays a crucial role in illusory shape processing. Bakar, Liu, Conci, Elliott and Ioannides (2008) investigated this by presenting illusory shapes at central and peripheral visual eld locations. Their behavioral results revealed that central stimulus presentations elicited faster responses than those presented at one of the four quadrants. In addition, magnetoencephalographic responses to illusory gures showed that for central presentations, speci c responses to illusory gures peaked rst in V1/V2 (96-101 ms), and then in the lateral occipital complex (LOC; 132-141 ms). For peripheral presentations, the relative modulation towards illusory gures was markedly reduced in V1/V2 and LOC while prominent activation peaks shifted to the fusiform gyrus (from 200 ms onwards). Thus, the processing of illusory shapes in the periphery required longer latencies and the involvement of higher-level visual areas. Overall, this combined evidence might explain the absence of differences in misreporting the global orientation between anker conditions observed in our results.
In sum, the present results suggest that crowding occurs simultaneously across multiple levels of visual processing, both at a lower level between local shapes but also at a higher level between global and local shapes. These results are in line with recent ndings (Doerig et al., 2019) showing that the global stimulus con guration plays a crucial role in crowding, stressing the necessity of including grouping-like processes as a fundamental explanation of crowding. Since both local anker information and global shape representations are preserved (not averaged) in crowded displays, our results provide a further challenge to low-level "pooling" or averaging models of crowding. The development of new pooling models (Bornet, Choung, Doerig, Whitney, Herzog, Manassi, 2021; Rosenholtz, Yu & Keshvari, 2019), which incorporate the possibility of multilevel crowding and account for complex target-anker interactions, might lead to a better explanation of the phenomena of crowding in general, and the current results in particular.

Data Availability
The methods used and the data analyzed in the present study are available in the Open Science Framework (OSF) repository, in the following link: https://osf.io/f3cz8/? view_only=b47af6157a1941d48edbddd3b0ecea10.

Author Contribution
AY and RK developed the study concept. All authors contributed to the study design. Testing and data collection was performed by MJ. M.J and AY performed the data analysis and all authors contributed to result interpretation. MJ drafted the manuscript, and AY and RK provided critical revisions. All authors approved the nal version of the manuscript for submission.