Grouping Strategies in Numerosity Estimation between Intrinsic and Extrinsic Grouping Cues

The number of items in an array can be quickly and accurately estimated by dividing the array into subgroups, in a strategy termed “groupitizing.” When attempting to remember a telephone number, it is better to do so by dividing the number into several segments. Different forms of visual grouping can affect the accuracy of the enumeration of a large set of items. Previous studies have found that when groupitizing, enumeration accuracy is improved by grouping arrays using visual proximity and color similarity. Based on Gestalt theory, Palmer (1992) divided perceptual grouping into intrinsic (e.g., proximity, similarity) and extrinsic (e.g., connectedness, common region) principles. Studies have investigated groupitizing effects of intrinsic grouping. However, to the best of our knowledge, no previous study has explored groupitizing effects for extrinsic grouping cues. Therefore, this study explored whether extrinsic grouping cues differed from intrinsic grouping cues for groupitizing effects in numerosity estimation. The results showed that both extrinsic and intrinsic grouping cues improved enumeration accuracy. The extrinsic grouping cues were more accurate in terms of sensory precision of the numerosity estimation than were the intrinsic grouping cues.


Introduction
Numerosity perception is a quantitative attribute of entities that is an important dimension of nature. For example, one tree produces more fruits than another, and there are more sheep in one territory than in another. In the process of understanding and adapting to nature, humans and animals have gradually evolved in their ability to perceive numerosity [1][2][3] .
Three strategies are usually used in numerosity perception: subitizing, counting, and estimation 4,5 . For a small number of clusters (usually less than 4), humans can quickly and accurately determine the number of items in the clusters. This is called "subitizing," which indicates that the items can be understood immediately, without thinking 4,6,7 . With an increasing number of items (greater than 4), the time required to determine the number of items also increases correspondingly while requiring the coordination of many visual and spatial operations, as the observer determines the number of objects by counting 8, 9 . However, when the number of items in clusters is large and cannot be counted in a very short time (for example, the presentation time of the stimulus is short), numerosity perception may be inaccurate, pursuant to Weber's Law, the approximate number system (ANS) can be relied upon 10 . People thus use estimation to determine the approximate number of objects [10][11][12] .
Beyond counting and subitizing, recent studies have found that arrays visually divided into sub-groups can be enumerated faster and more accurately than ungrouped arrays; this is called "groupitizing" 4, 13-17 . Many studies have begun to explore the mechanisms of groupitizing. For example, Starkey and McCandliss (2014) found that groupitizing is positively correlated with the arithmetic ability of children and adults, indicating that the grouping ability may re ect the arithmetic strategies 13 . Moscoso et al.
(2020) suggested that groupitizing is a process based on attention, which depends on the subitizing system, and mathematical ability is correlated with groupitizing 15 . Ciccione and Dehaene (2020) indicated that in groupitizing, subjects use mental multiplication and mental addition to increase speed and accuracy of enumeration 4 . Wege et al. (2021) provided an explanation of how the numerical information needed for a mental calculation is extracted from grouped arrays, and suggested that the parallel subitizing of dots and groups in grouped arrays may represent the enumeration processes necessary for groupitizing via mental multiplication 18 .
The groupitizing effect must be based on visual grouping. Research into visual grouping processes and perceptual organization developed against the background of Gestalt theory appeoximately one-hundred years ago 19-21 . Wertheimer (1923) proposed the main principles of perceptual grouping, which speci ed which regions of images constituted objects or perceptual units, such as similarity, proximity, symmetry, good continuity, connectedness, or common region 21 . Based on Gestalt theory, Palmer made an important distinction between intrinsic and extrinsic grouping principles. Like most classical Gestalt principles, intrinsic principles are based on the inherent relationships among attributes of grouped elements (e.g., color, shape, size, position). In contrast, the extrinsic principles are based on relationships among elements and other extrinsic elements that induce them to group [22][23][24][25] .
Previous studies of groupitizing only involved intrinsic grouping cues (color similarity 4,14 and proximity 4,[13][14][15] ). To date, no research has explored numerosity estimation with extrinsic grouping cues. Thus, this study explored whether extrinsic grouping cues are different from intrinsic grouping cues in numerosity estimation. Previous studies have found that extrinsic grouping cues have advantages over intrinsic grouping cues 22, 26-28 . Luna et al. suggested that observers respond more quickly to extrinsic than to other grouping cues 22,29 . Quinn and Bhatt reported that young infants (3-4 months old) are sensitive to extrinsic cues, especially common region and connectedness 22,29 . Therefore, we hypothesized that extrinsic grouping cues would be advantageous in numerosity estimation.
In addition, vision research has revealed that shape is crucial for object recognition [30][31][32][33] . In the absence of other visual information, it is easy for humans to use shapes to identify objects. Human adults and children prefer to classify new objects according to their shape, given con icting color and texture cues 30,33,34 . Accordingly, in this study, a shape similarity cue was added to the intrinsic grouping cues to verify whether the shape similarity grouping cues have different effects than other intrinsic grouping cues (i.e., color similarity and proximity).

Participants
Fifty-three freshman college students (mean age = 19 years, standard deviation = 2.4, range = 18-22) with normal (or corrected-to-normal) vision, and no color blindness were selected. We replicated the experiment in three groups of participants with low, medium, or high levels of math knowledge (for a similar approach, see Dehaene et al., 2020) 4 . At the highest level, we tested 16 science students majoring in mathematics, all of whom had scored more than 120 points on China's mathematics college entrance examination in 2020. For the medium level, we tested 18 humanities students with math scores between 60 and 90 in the college entrance examination (the mathematics component of the college entrance examination for science is more di cult than for humanities. The maximum score for mathematics is 150). We tested 19 students in the low-level group. They were sports students who had never taken university level exams in mathematics or related disciplines. The third group had much more limited familiarity with mathematics (they had not been taught mathematics for at least one year).

Materials and Procedure
Stimuli were presented using E-Prime 2.0. Participants sat in a quiet and dimly light room, 60 cm from a screen monitor (60 Hz). At the beginning of each trial, a xation point was presented in the center of the screen and remained on the screen throughout the experiment. After 500 ms, a stimulus was displayed for 500 ms, followed by a screen with an input box. Participants estimated the number of stimuli present and entered the estimated result into the input box as quickly and accurately as possible using a numeric keypad (Fig. 1A). Response time was measured from stimulus offset to when the input box was present.
Each condition was tested in separate blocks, and participants were never explicitly informed of the grouping cues.

Stimuli
All stimuli were distributed in a 6° × 6° square grid, which consisted of 144 small squares, where each square was 0.4° × 0.4°, and the array was placed at the intersection of the grids, so that each item had 121 possible positions (Fig. 1B). We tested all numerosities between 5 and 17, so that there were 13 different numerosities. In the grouping conditions, each numerosity was divided into 2-4 subgroups, and each subgroup contained between 2 and 6 items, con gurations were as following:

Extrinsic cues
Extrinsic cues included connectedness and common region.

Connectedness
In the connectedness conditions, the stimuli were sets of white squares (0.4° × 0.4°) with black borders randomly distributed in the grid. The squares within subgroups were connected by a black line, with the connection at the center of the square. In the no-grouping condition, there was no black line connection, and each item was randomly distributed in the large grid (Fig. 1B).

Common region
In the common region conditions, stimuli were also sets of white squares (0.4° × 0.4°) with black borders. The grid was divided into four quadrants, and the squares of each subgroup were randomly distributed inside the small square boxes (2.5° × 2.5°) in the four quadrants. For example, Fig. 1B is a 3, 3, 3 group with only three subgroups, so there are only three boxes. In the no-grouping condition, there were no small square boxes, and each item was randomly distributed in a large grid (Fig. 1B).

Intrinsic cues
Intrinsic cues included color similarity, shape similarity, and proximity.

Color similarity
The color similarity conditions were the same as those used by Anobile et al. (2020) 14 . Individual items (0.4° × 0.4°) could be red, blue, yellow, or green, (RGB: 255 0 0; 0 0 255; 255 255 0; 0 255 0, respectively). Colors was arranged from left to right, so that similar colors appeared in a vertical column (see Fig. 1C for a 3, 3, 3 group), where squares were rst randomly arranged, then the rst three squares were colored red (from the left to right), the next three blue, and the remaining three yellow (colors were randomly selected for each group). In the no-grouping condition, positions of the squares were arranged with the same logic, but the colors were randomly assigned.

Shape similarity
The shape similarity condition was similar to the color similarity condition. The only difference was that the four colors were replaced by four shapes: square, circle, triangle, and diamond; all shapes were 0.4° × 0.4°, and white with black borders (Fig. 1C).

Proximity
The proximity conditions were the same as those used by Anobile et al. (2020) 14 . Stimuli were arranged into four possible groups of 12 possible positions. Each group (spanning a maximum area of 4° × 2°) was located in the same quadrant and centered at 5° from the central xation point. Each group was rst randomly assigned to one quadrant (between 1 and 4); then, the individual item positions were randomly selected between one of the 12 positions in the selected quadrant. Within each quadrant, the maximum center-to-center distance between each element was 4°, and the minimum was 1°. In the no-grouping condition, each item was randomly distributed in the large grid (Fig. 1C).
We excluded trials with RTs more than three standard deviations from the average reaction time. The median response times for correct answers were computed for each subject. Precision was measured by the coe cient of variation (CV), which is a dimensionless precision index that allows cross-numerical comparison of average performance. N i is the analyzed numerosity, and i is the standard deviation of the responses to numerosity i. Data were analyzed by repeated measures ANOVA, with effect sizes reported as η 2 , using JASP and SPSS. In addition, we used Bayesian ANOVA inference for additional analysis, because quantifying evidence in favor of both difference and equality was crucial for testing our hypotheses (Wagenmakers et al., 2018). We report the Bayes factors in favor of the alternative (BF 10 ). A BF 10 larger than 1 indicated evidence supporting the alternative hypothesis, and a BF 10 less than 1 indicated evidence for the null hypothesis.

Statement
All coauthors agreed with the contents of manuscript. The study with human subjects was conducted in accordance with the Declaration of Helsinki. This study was approved by the School of Psychology Ethics Committee at Guizhou Normal University. All participants signed informed consent forms prior to the experiment. All methods were carried out in accordance with relevant guidelines and regulations.

Results
The results of this study found that, reaction times in the grouping condition were shorter than those in the no-grouping condition, and sensory precision was superior in the former case (see Tables 1 and 2). In addition, the extrinsic grouping cues were associated with more accurate numerosity estimation than were the intrinsic grouping cues.  Groupitizing and grouping of cues As in several previous studies 14,15 , we also investigated grouping effects on sensory precision, as indexed by the coe cient of variation (Eq. 1). CV is a classical psychophysical parameter; in numerosity perception, this parameter re ects the sensory noise associated with the estimation process: the higher the CV value, the more sensory noise, and thus the less precise the estimates. We compared the grouping effects of extrinsic grouping cues (connectedness, common region) and intrinsic grouping cues (color similarity, shape similarity, proximity) on reaction time and coe cient of variation. The ANOVA of RTs revealed a signi cant main effect of grouping cue F (4, 47) = 15.526, p < 0.001 *** , BF 10 > 100. The interaction between grouping cue and grouping condition was signi cant, F (4, 47) = 18.451, p < 0.001 *** , BF 10 > 100. The ANOVA of CV also revealed a signi cant main effect of grouping cue F (4, 47) = 2.894, p = 0.034 * , BF 10 > 100; however, its interaction with grouping condition was not signi cant.
For the main effect of grouping cue, the reaction time for extrinsic grouping cues was slower than that for intrinsic grouping cues, while the coe cient of variation for extrinsic grouping cues was lower than that for intrinsic grouping cues, indicating that the sensory precision for extrinsic grouping cues was greater (less sensory noise) than that for intrinsic grouping cues (Fig. 2). The interaction between grouping cue and grouping condition was not signi cant for CV, as shown in Fig. 3. Among all grouping cues, sensory precision in the grouping condition was more greater than in the no-grouping condition, and the grouping effect of extrinsic grouping cues (connectedness and common region) was stronger than that of intrinsic cues. The effect of the interaction on RT was signi cant; for extrinsic grouping cues, there was no signi cant difference in RT between grouping and no-grouping conditions, but a difference was present for intrinsic grouping cues. Proximity and shape similarity grouping cues had a better grouping effect (Fig. 3). The RT for extrinsic grouping cues was slower than that for intrinsic grouping cues. Moreover, we found that large numbers were underestimated for each grouping cue (Fig. 4), consistent with the results of previous studies 14 .
Because the interaction between grouping condition and numerosity was signi cant, we next examined how RTs and coe cient of variation varied with numerosity in each condition. It can be seen from Fig. 5 that both RT and CV increased linearly with the numerosity; in the grouping condition, numbers 6, 9, 12, and 16 had faster RTs and lower coe cients of variation than adjacent numbers. In contrast, numbers 7, 11, 13, 17 had slower RTs and a higher coe cient of variation than their neighbors (Fig. 5).

In uence of math knowledge
We compared grouping effects for persons with high, medium, and low levels of math knowledge. ANOVA of RTs revealed a signi cant main effect of math knowledge, F (2, 52) = 4.798, p = 0.012, BF 10 > 100, and its interaction with grouping condition was also signi cant, F (2, 50) = 1.496, p = 0.004, BF 10 > 100.
Regarding the main effect of math knowledge, when compared with the other two groups, the subjects in the high math knowledge group had faster RTs and lower coe cients of variation. Because of the signi cant interaction between math knowledge and grouping condition, we performed a simple effects analysis to further test the differences in grouping condition at different levels of math knowledge (Fig. 6).

Discussion
Our results showed that when items were divided into several subgroups, this bene ted estimation the number of items. Furthermore, according to Gestalt theory, perceptual grouping can be divided into extrinsic and intrinsic grouping cues [23][24][25] . Accordingly, this study explored whether different grouping cues had different in uences on groupitizing. The results showed that although extrinsic grouping cues had longer RTs, the sensory precision was more accurate in this case, and the grouping effect was stronger.
The RT for extrinsic grouping cues was slower than that for intrinsic grouping cues, inconsistent with Luna et al. (2016), who found that extrinsic grouping cues, especially common areas, were associated with faster RTs than other grouping cues 22,29 . In addition, Quinn and Bhatt (2015) also found that early infants (4-6 months) were more sensitive to extrinsic grouping cues 26, 28 . Future research should select preschool children or rst-grade primary school children to explore whether children who have not studied mathematics or have no complete magnitude representation system have different groupitizing effects given different grouping cues. Although the RTs for extrinsic grouping cues were slower than for intrinsic grouping cues, the grouping effect of extrinsic grouping cues was stronger than that of intrinsic grouping cues in terms of sensory precision (Fig. 3). This may indicate that extrinsic grouping cues have the strong advantage of groupitizing, but compared with intrinsic grouping cues, the addition of connecting lines or closing lines leads to more visual interference and requires additional cognitive processing, thus leading to slower responses for extrinsic grouping cues.
In recent years, signi cant progress has been made in the visual science of perceptual grouping. Recent studies have focused on the temporal processes and neural basis of intrinsic and extrinsic perceptual grouping 26, 35 . For intrinsic grouping cues, grouping by proximity was found to be related to the positive component at the occipital electrode, whose amplitude peaks 100 to 120 ms after stimulus onset. The collinearity contour integral was found to emerge 130 ms after stimulation 36 . Grouping by similarity (shape or color) was found to appear much later, and after 300 ms from stimulus onset, the negative occipito-temporal wave was activated 22,35 . In contrast, the neural basis of extrinsic grouping principles has received less attention. Montoro et al. (2015) reported the neural effects associated with grouping by common regions. They found that common region grouping cues belong to the category of long-latency grouping principles, which primarily involve activity in extrastriate cortices 22,28 . Future research should continue to explore the time course and neural mechanisms underlying intrinsic and extrinsic grouping cues of grouping effects.
Numerous studies [37][38][39][40][41][42] have found that when the dots in the array are connected by lines or placed in a closed area, the number of dots will be underestimated. This phenomenon is termed the "connectedness illusion" [37][38][39][40][41][42] . Connectedness and closeness are similar to the extrinsic grouping cues used in the present study (connectedness and common region). Some researchers have used topological invariance to explain the underestimation of quantity caused by connectedness and closeness; they proposed that numerosity perception is in uenced by topological invariance such as connectedness and closeness.
Therefore, connecting and enclosing items leads to underestimation of numerosity 40 . Other studies suggested that two adjacent dots are considered to be one numeral unit when connected or closed via lines 37,38 . Moreover, some researchers found that although the connectedness illusion can lead to underestimation, when observers are required to reach for targets quickly, reach is not affected by the connecting lines 42 .
Because many studies have found the connectedness illusion, in the present study we examined whether extrinsic grouping cues (connectedness and common region) also led to underestimation of number in grouping conditions. The numerosities 6, 9, 12, and 16 were selected for analysis because these numbers had low RTs and high precision (see Results). We calculated the mean values across all subjects of the Weber fraction for the numerosities 6, 9, 12, 16, for all grouping cues by grouping condition.
The results demonstrated that for connectedness and common region grouping cues, no signi cant underestimation was observed; in contrast, connectedness led to signi cant overestimation when compared with other grouping cues (see Fig. 7B). These results suggest that no connectedness illusion was present in our study. Possible reasons include the following: First, from the perspective of experimental tasks, previous studies mostly used a bisection task or discrimination paradigm, which ask participants to determine which of two simultaneously or sequentially presented stimuli contain more dots. In the present study, participants were asked to report the estimated values directly. Moreover, in this study, the grouping conditions and no-grouping conditions were equally distributed within each session. The participants viewed and estimated the grouping condition and estimated the no-grouping condition in each session. For the no-grouping condition, the grouped stimuli were divided into smaller subgroups, which was more conducive for the participants' estimation, and further demonstrated that connectedness and common region had a strong groupitizing effect. Second, from the perspective of stimuli, in previous studies, dots were usually connected by line segments or curves in pairs or connected in series (Fig. 7A). Such stimuli may add an additional variable of common fate, because no matter whether connected by line segments or curves, items were arranged together according to the trajectory of lines, and the participants could easily perceive the connected items as holistic. However, in the present study, the items in the connectedness condition were randomly distributed in the grid, and the line segments connected the center points of each item. The connected items could have encouraged the participants to perceive them as a subgroup instead of being holistic; this would not cause underestimation. Furthermore, most previous studies involved extra lines in the unconnected condition (Fig. 7A). However, in this study, only the grouped conditions had additional connecting or enclosing lines, while the no-grouping condition had no additional lines. Regarding the mechanisms underlying the role of extrinsic grouping cues in grouping effects, different experimental paradigms could be adopted for further exploration in future research.
Interestingly, we found that RTs for grouping by shape similarity were signi cantly lower than those of the other groupings (Fig. 3). Studies have shown that, in the absence of other visual information, it is easy for human beings to identify objects by shape [43][44][45] . Adults and children prefer to categorize novel objects according to shapes, given con icting colors and texture cues. Shape features play a more important role in inductive reasoning than do color features 44,45 . Shape similarity is the rst strategy used in inductive reasoning in early childhood 46 . Researchers presented subjects with reference stimuli of color, shape, and texture (such as square, blue, and wooden), and then presented two test stimuli with different shapes, a different color, and a different texture, so that children could judge whether the test stimulus was consistent with the reference stimulus 47 . The results showed that 2-3 year old children chose shapes as the basis of induction. Future studies might select developing children as participants to explore whether the grouping effect of quantity estimation in shape similarity is different between children and adults.
Regarding math knowledge, the grouping conditions for the high math knowledge group differed signi cantly in reaction time, while the middle and low math knowledge groups did not exhibit such a difference, indicating that the groupitizing effect bene ts the persons with high math knowledge the most. Many studies have demonstrated that an e cient ANS may be a prerequisite for the typical development of math skills 10,11 . Therefore, we speculate that the high math knowledge group had a more re ned ANS, and that the groupitizing strategy was automatically used in quantity estimation. In the grouping condition, items were visually divided into subgroups, and since individuals with high levels of math knowledge could make better use of groupitizing strategy, RTs in the grouping condition were signi cantly faster than those in the no-grouping condition. For the middle and low math knowledge groups, although sensory precision was higher in the grouping condition, RTs did not differ among conditions. This may be because, in the grouping condition, when they used groupitizing strategies, they needed to employ more cognitive resources and required more time. In the no-grouping condition, they could not use any strategies; they could only make rough guesses based on their feelings. Thus, RTs were faster but accuracy was lower.
The present study found that, in the grouping condition, numbers 6, 9, 12, and 16 were associated with faster reaction times and lower coe cients of variation than adjacent numbers. This may be because those speci c numerosities' con gurations  4 , who found that for 5, 7, 11, and other such prime numbers, RTs were slower than for their neighbors, and for non-prime numbers, which could be subdivided into equal numbers, RTs were faster than for their neighbors (Fig. 5).

Conclusion
The present study demostrates that visually dividing an array into subgroups promotes quantity estimation. Moreover, for the rst time, our research combined the groupitizing effect of numerosity estimation with Gestalt theory and demonstrated a difference between the groupitizing effect of extrinsic versus intrinsic grouping cues, based on Palmer et al. [23][24][25] . The results thus suggest that it takes longer for estimation of numerosities given extrinsic grouping cues, but the accuracy for extrinsic grouping cues is higher, due to a stronger groupitizing effect than for intrinsic grouping cues.

Limitations And Future Directions
Since the concept of "groupitizing" was proposed by Wender and Roth Kegel (2000) 17

and Starkey and
McCandless (2014) 13 , studies have continued to explore the effect of grouping. This study combined the grouping effect and perceptual grouping principles, thus extending the study of groupitizing and the eld of perceptual grouping.
Recent studies have begun to explore the shared associative mechanisms between different perceptual features. For example, the theory of magnitude model 48, 49 proposes that the parietal cortex of human beings processes quantitative information about space, time, and numbers together to optimize action plan and execution. It is necessary to explore the relationship between magnitude representation, space, and time. In this study, we only studied the grouping effect in space, but subsequent studies could verify the differences in the grouping effect between intrinsic and extrinsic grouping cues in the time dimension.
The participants in this study were adults. Although they were divided into high, middle, and low math knowledge groups, the difference in the math level of adults is not very prominent. Many studies have found that in the process of development and formal arithmetic learning, numerosity perception precision has been greatly improved, while in educated adults, symbolic mathematics abilities may have been stably mapped into their basic non-symbolic representation, making this connection less obvious [49][50][51] . Future research could explore the differences in groupitizing effects between preschool children with a low number sense and adults, as well as between children with di culties in math and children without math di culties.
To date, there has been no electrophysiological or neuroscience study that explores the grouping strategies between intrinsic and extrinsic cues. Future research could be conducted from the perspective of electrophysiology to investigate the neurofunctional links between grouping strategies and intrinsic and extrinsic cues, which would delineate a possible neural hierarchical model for "groupitizing."  Results of extrinsic and intrinsic grouping cues. Reaction time and coe cient of variation (CV) for extrinsic and intrinsic grouping cues by group condition. ***p < 0.01, **p < 0.01 *p < 0.05.    Results for math knowledge. Reaction time and coe cient of variation (CV) in groups with high, medium, and low levels of math knowledge by group condition. ***p < 0.01, **p < 0.01 *p < 0.05.