When style matters: visual exploration is low dimensional and driven by intrinsic dynamics.

It is often assumed that we look at objects that are salient and behaviorally relevant, and that we pay attention differently depending on individual genetics, development, and experience. This view should imply high interindividual variability in eye movements. Conversely, we show that 60% of eye movements variance of more than a hundred observers looking at hundreds of different visual scenes could be summarized by a few components. The rst component was not related to image-specic information and identied two kinds of observers during visual exploration: "static" and "dynamic". These viewing styles were accurately identiable even when observers looked at a blank screen and were described by the degree of similarity to a power-law distribution of eye movements, which is thought to be a measure of intrinsic dynamics. This suggests that eye movements during visual exploration of real-world scenes are relatively independent of the visual content and may underlie intrinsic dynamics.


Introduction
The exploration of visual scenes through eye movements is a complex behaviour mediated by the sequential and recursive interactions of multiple cognitive processes. For many years it was thought that eye movements were predominantly guided by stimulus-driven factors such as the sensory distinctiveness of different objects in the visual eld. A highly in uential model by Itti  However, since the seminal studies of Yarbus 5 , it has been known that the patterns of eye movements depend not only on low-level features, but also on the behavioural relevance of stimuli in the visual scene, e.g. people, faces, etc., as well as the goals of the observer. Therefore, current theories, and computational models, propose that visual exploration is guided both by sensory and cognitive signals 3,6,7 . In general, these accounts t the classic view of the brain as a sensory-motor analyser whose activity is mainly driven by the analysis and transformation of sensory stimuli into motor decisions. However, a recent study comparing different visual exploration models showed that they account only for a small portion of variance of eye movement patterns 8 . This suggests the presence of other, still unknown, mechanisms that drive eye movement exploration.
Part of the di culty in explaining the variability of visual exploration might be related to individual differences. Indeed, observers exhibit consistent individual differences in eye movement parameters that generalize across tasks (e.g., visual search vs. xation 9,10 ) or different versions of the same picture 11 . One study found that eye movement parameters were correlated across different laboratory tasks (e.g. sustained xation vs. search vs. Stroop paradigm), and that the majority of variability across subjects could be summarized with a single factor putatively related to visual attention 12 . These results led to the idea that eye movement patterns may re ect an intrinsic or endogenous 'signature' relatively independent of visual input or goal 9 . Indeed, these patterns relate to individual cognitive styles 13 and personality traits 14 , and are in part under genetic in uence 15 .
In this study we aimed to quantify the role of stimulus-driven vs. intrinsic factors in visual exploration by examining eye movement patterns in a large group of healthy participants while they viewed a large set of real-world scenes vs. when they viewed a blank screen devoid any structure visual stimulus.
First, we asked if the variability of eye movements across subjects and visual scenes could be explained with a relatively low number of dimensions.
A low dimensionality independent of the visual content of the images would be consistent with the importance of endogenous factors. Second, we asked whether eye movement patterns during exploration were accounted for by the sensory features of the images, their semantic content, or the power law distribution of gaze steps 16 . Power laws relations are ubiquitously found in nature and predict many complex phenomena such as earthquakes 17 , volcanic eruptions 18 , stock market 19 , and foraging behavior of many species 20,21 . The identi cation of a power-law behavior in biological systems is thought to re ect intrinsic constraints of the organization, e.g. anatomical connection or neural dynamics in the case of the brain [22][23][24][25] . Power-law scaling relations have been also found in eye movement patterns during visual search 16 . Finally, to further test the intrinsic dynamics of eye movements, we investigated whether speci c visual exploration patterns could be identi ed during spontaneous visual exploration in the absence of visual stimuli, i.e, when looking at a blank screen.

Results
Healthy participants (n = 120) were recruited at the University of Padova, with n = 114 satisfying exclusion criteria (Supplementary Table 2 for demographic information). All participants had normal or corrected-to-normal (i.e., glasses, N = 54) vision. Participants (aged 19-34 years) were tested in a single experimental session lasting approximately two hours during which their eye-movements were tracked while watching a blank screen or freely exploring a set of 185 real-world scenes. These scenes were selected from a larger set of 36,500 pictures 26 ( Supplementary Fig. 1 for the owchart used for selection) to be representative of the following categories: indoor vs. outdoor, which in turn were divided into natural vs. man-made. The content of the pictures had no emotional valence and half of them contained human gures ( Supplementary Fig. 2 shows exemplars of each category). Participants were asked to look at each picture carefully, as they were told that they would be asked some questions later on, and, when ready, to advance to the next picture by pressing the spacebar on the computer keyboard (Fig. 1). A large set of eye movement features (i.e., 58) were extracted including: xation duration and position, gaze step amplitude and direction, pupil diameters, etc. (Supplementary  Table 3). A battery of behavioural tests and questionnaires were then administered to evaluate working memory, visuospatial memory, impulsivity, anxiety, and personality traits (Supplementary Table 2 for a list of the measures).
All volunteers received 10 € for their participation.

Low Dimensionality In Visual Exploration
The rst question we addressed is whether eye movement patterns during visual exploration are 'different' or 'similar' across individual observers. We examined the pattern of correlation across eye movement features and subjects by running a principal component analysis (PCA) on the scaled and mean-centred full set of features extracted from the gaze data acquired during the exploration of images.
A three-components solution accounted for 59% of total variance ( Fig. 2  We then performed a k-means cluster analysis splitting the sample in two clusters. The k = 2 clustering solution was chosen by comparing the similarity between k-means and hierarchical clustering labels obtained with different distance measures and values of k ( Supplementary Fig. 3 for details). Figure 3a shows the distribution of observers along the rst three principal component (PC) scores. The best separation (ROC analysis accuracy = 99.9%, 95% C.I. [95.83-100] with cut-off value of 0.69, AUC = 99.9%) was obtained along the PC1 score (Fig. 3b).
Participants with high PC1 scores were nicknamed "Static Viewers", because they showed a lower xation rate but longer xations.Participants with low PC1 scores were nicknamed "Dynamic Viewers", because they showed more frequent but shorter xations (Fig. 3c).
More generally, static viewers also tended to explore images for longer, and showed on average higher amplitude and more numerous gaze steps, more gaze ips, smaller pupil diameter, as well as a distribution of gaze steps more similar to a power law. Moreover, they tended to focus less on spatial location with more semantic and saliency information (see Methods section and Supplementary Fig. 6 for details on the extraction of semantic and saliency information). Dynamic viewers showed an opposite pattern of features. Figure 4 shows a characterization of the viewing styles in terms of individual features by the effect size of each variable (Cohen d).
The robustness of this solution was tested by splitting the images in odd and even, computing a PCA in each subset, and then correlating the corresponding PC1 scores. We found a high degree of similarity (for all images vs. odd; all images vs. even; and even vs. odd images, all r values > 0.97, Supplementary   Fig. 4). Furthermore, each participant cluster label remained substantially the same when the cluster analysis was run on even (92.1%, i.e., 105/114) or odd (97.4%, i.e., 111/114) images.
As further control analyses, rst we used PC1, PC2 and PC3 to reconstruct the original features matrix and compared the similarity of the resulting reconstruction ( Supplementary Fig. 5A).
The most accurate reconstruction was obtained using PC1, compared to other components. Overall, these ndings support the low dimensionality of eye movement exploration patterns across many subjects and types of visual scene.
Relative in uence of sensory, semantic, and endogenous variables on visual exploration styles Next, we examined if visual exploration eye movements across subjects were predicted by stimulusdriven or intrinsic factors. We used PC1 scores as dependent variable in a linear regression model that showed a signi cant effect of KSD (t=-3.79, p < .001; Fig. 6a and Supplementary Fig. 7) and a trend to signi cance for the effects of ShEn (t = 1.76, p = .081). In contrast, SEM and SAL were not signi cant even though the pictures were signi cantly different in their semantic and saliency content ( Supplementary   Fig. 6 bottom), and these factors would be expected to drive eye movements. See Visual Exploration model in Supplementary Table 1 for further details.

Control Analyses
The model was validated in a split-half design in which 57/114 participants were randomly selected to t the model parameters while the remaining 57 were used only for testing (i.e., prediction of PC1 scores).
This procedure was repeated 1,000 times and the Pearson's r coe cient was collected for each iteration to test the correlation between actual and predicted PC1 scores. All correlations were positive (97.4% of them were signi cant), with a mean Pearson's r value of .42 (SD = .078; Fig. 6b).
Next, to rule out the possibility that the results were biased by the eye-tracker's relatively low spatial resolution (~ 0.2°, 120Hz acquisition rate), we checked the similarity of eye-movements patters to powerlaws, as computed through the KSD, using different thresholds of gaze-step length (0.2°-8.1°). Speci cally, we removed gaze-steps smaller than each threshold, recomputed the KSD calculation, and the linear regression model predicting PC1 values. This analysis showed that the contribution of KSD was stable across multiple thresholds (0.2°, 0.4°, 0.8°, 1.6°, 3.2°, 4.0°, 4.9°) eliminating the possibility that this effect was driven by small eye-movements not detected by the eye-tracker ( Supplementary Fig. 8).
In control analyses, we ran the same model on PC2 (loading on gaze steps direction) and PC3 (loading on gaze steps length). The full model (SAL, SEM, ShEn, KSD) indicated that KSD was predictive of PC2 (t=-2.96 p = 0.004), while SEM was predictive of PC3 (t=-2.45 p = 0.02). Again, we did not nd a signi cant contribution of the SAL variable.
This analysis shows that the pattern of eye movements during visual exploration of scenes is explained by a few components (~ 60% variance across images and subjects). These components can be used to separate two styles of viewing (> 90% accuracy of classi cation) that are not predicted by sensory salience. On the other hand, the visual exploration style was signi cantly predicted (~ 20% variance) by intrinsic dynamics captured by the similarity of the eye gaze steps length distribution to a power law.

Identi cation of visual exploration styles in blank screen viewing condition
Given the signi cant in uence of intrinsic eye movement dynamics on visual exploration, we asked whether the pattern of eye movements could be used to accurately classify participants during visual exploration of a blank screen (herein "blank screen viewing"). A positive result would strongly support the idea that intrinsic factors independent of visual analysis are important in controlling eye movement patterns. To test this hypothesis, we applied the same pipeline of analysis, i.e. features extraction and PCA to 30-sec of blank screen viewing data prior to the presentation of the rst image. It should be emphasized that subjects had not seen any of the images prior to the blank screen viewing observation period. Fourteen participants were removed from this analysis because they maintained steady xation in the center of the screen and did not show any exploratory eye movements to the blank screen. The blank screen viewing data analysis was thus conducted on a sample of N = 100 subjects.
Not surprisingly, the order of components during blank screen viewing was not the same as during visual exploration. Fixation features that loaded on PC1 during visual exploration moved to a weak PC3 during blank screen viewing (7 out 11 features, loading > = 0.2). Conversely, PC1 in blank screen viewing loaded on the maximum length and variability of gaze steps, as well as on the number of ips on the Y axis, features that were mainly related to PC2 and PC3 during visual exploration (6 out of 7 features, loading > = 0.2). This was also con rmed quantitatively by running a linear regression model with PC1 of blank screen viewing as dependent variable, and PC 1-3 of image viewing (as well as their interactions) as predictors. This model showed that PC3 during image-viewing signi cantly predicted PC1 during blank screen viewing (t = 2.98, p = .004).
Next, we used blank screen viewing eye movement features to predict individual subject labels (Static vs. Dynamic viewers) using a Random Forest algorithm in a cross-classi cation design. That is, the algorithm was trained on features extracted in the blank screen viewing condition and tested on cluster labels extracted during the image-viewing task. The model showed an accuracy of 79% (p < .001; 95% C.I. [71.3-87.0]) in predicting cluster labels from features extracted from blank screen viewing (Fig. 7a).
Inspection of the between-subjects correlation matrix of eye movement features during visual exploration and blank screen viewing shows that individuals tend to correlate more with members of the same cluster than with members of the other cluster (Fig. 7b). Moreover, the structure of the between-subjects similarity in visual exploration ( Fig. 7b left matrix) signi cantly correlated with that in blank screen viewing ( Fig. 7b right matrix; Pearson's r = .37, p < .001). These ndings show that the visual exploration style found during free viewing of natural scenes is identi able even in absence of visual stimuli.
Importantly, to be sure that participants were actually exploring the images, after the free viewing phase, they were asked to recall and describe a subset of images which were repeated ve times. The average number of freely recalled details was 59.97 (SD = 20.5; range: 22-141; 2.6% false memories) across all the ve images.

Visual Exploration Styles Are Related To Cognition And Personality
The nal analysis investigated whether visual exploration styles (as indexed by PC1 scores) were related to individual characteristics, namely demographic information (i.e. age, sex, education), cognitive scores (i.e., inhibition, visuospatial and verbal memory) and personality traits (i.e. Big Five scores).
Indeed, an emerging body of research has suggested that eye-movements are in uenced by personality traits 14 to validate the model, we applied the split-half procedure described before with 1,000 iterations (Fig. 6e).

Discussion
In this study we measured eye movements in a large sample of healthy participants during visual exploration of many real-world scenes. We found that eye movement parameters were strongly correlated across pictures and participants, with three components explaining roughly 60% of the variance of eye movements and xations. This low dimensional structure of eye movement patterns, especially the duration and number of xation (PC1) identi ed two viewing styles: static and dynamic. The interindividual variability of PC1 scores was predicted by their similarity to a power-law distribution, an intrinsic property of non-linear dynamical systems, but not by the saliency or semantic content of the visual scenes. In addition, static and dynamic viewers could be identi ed by the pattern of eye movement while participants looked at a blank screen, and they differed in their cognitive pro le.
Herein, we discuss two main results: the low dimensionality of eye movements during visual exploration, and the role of intrinsic dynamics vis-a-vis sensory salience and semantic information in guiding eye movements.
The low dimensionality of eye movements is not an entirely novel result. Poynter and colleagues 12 , in a study on n = 40 subjects, found that eye movement parameters were correlated across different laboratory tasks (e.g. sustained xation, search, Stroop), and could be summarized with a single factor, putatively related to visual attention. Their factor loaded on the duration and frequency of xations that is also an important component of our PC1. Using a larger set of features, we separated two clusters of patients, static and dynamic, who differed not only in terms of rate or duration of xation, but also pupil diameter, spontaneous viewing time, amplitude and number of gaze steps, and number of gaze ips (Fig. 4). The assignment to one cluster or the other was stable (> 90% accuracy) across different sets of images.
Static viewers showed less frequent but longer xations, explored images for longer, larger and more numerous gaze steps, more gaze ips (i.e., change of gaze direction), smaller pupil diameter, as well as a distribution of gaze steps closer to a power law. Moreover, they spent less time on parts of the images that were rich in semantic and saliency information. Dynamic viewers showed the opposite pattern. Intuitively, static viewers better approximated a power law distribution because they showed more small amplitude and relative few long-range gaze steps, while dynamic viewers made a more balanced combination of short and long gaze steps.
The covariance of xation duration and gaze step distribution is consistent with an interdependent control process 35 . At the neural level, xation and saccadic activity are inter-related at multiple levels in the brain (frontal eye eld, superior colliculus, brainstem [36][37][38]. At the cortical level, different neural systems, the dorsal and ventral attention networks 39,40 , control focal processing vs. re-orienting to novel locations.
Visual processing occurs during xations, hence a longer xation time in static viewers may imply morein-depth processing of fewer stimuli. Conversely, dynamic viewers may look more rapidly, and more super cially, to more items in a visual scene. This interpretation is also consistent with the observation that dynamic viewers tend to be more impulsive.
The presence of low dimensionality and "styles" in human cognition that de ne inter-individual variability is consistent with other recent ndings. For instance, a recent study classi ed individuals along the Big-Five dimensions of personality based on patterns of eye movements in real life (walking on campus) 14 .
Similarly, studies of human mobility have revealed two distinct styles 41 during walking from one location to another in a city: "Returners" who tend to walk back-and-forth nearly always taking the same trajectory, and "Explorers" who explore more frequently new locations in their route. The authors showed also a social bias in the mobility pro le, with a tendency to engage more socially individuals with a similar mobility pro le.
In the eld of reward, we have recently shown that the temporal discount functions in a large group of healthy subjects (n = 1200) show a Pareto optimality distribution that de nes three archetypes: people who always wait for larger rewards; people who always take immediately; and people who take immediately when the reward is large 42 . The existence of different styles may re ect trade-offs in cognitive or physical traits that have been selected during evolution to maximize specialized performance, similarly to what shown in other elds such as animal behavior 43 or biological circuits 44 .
Next, we asked what controls the low dimensionality of eye movement patterns across subjects? Sensory salience was quanti ed using a classic saliency model 1 , while semantic information was quanti ed based on a deep learning neural network 26 . These variables were used as predictors of PC1 scores, along with a measure of visual scanning topography (Shannon entropy of eye movements), and the distance of each individual eye movement distribution to a power law (Kolmogorov-Smirnov distance; See Methods section). The presence of power law dynamics in behavior (including eye movements), as well as in neural systems 22 , is thought to re ect intrinsic dynamics 29,30 . Surprisingly, we found that saliency or semantic information did not predict signi cantly PC1 scores (nor PC2). It is important to note that this result is not due to averaging of saliency or semantic information across pictures, thus leaving only "common" eye movements. Rather, estimates of saliency and semantic information were computed xation by xation, therefore taking into account eye movement patterns in each picture separately. Our results are consistent with recent studies suggesting that free viewing is not best predicted by saliency models 45 . Saliency models may be more important when the task strongly constrains the search strategy (e.g. search for a red target) but seem to weaken their predictive power in free exploration conditions.
In contrast, similarity to a power law distribution predicted a signi cant fraction of PC1 score variability in our free-viewing task. Power laws have been ubiquitously found in physics, as well as in the brain where they are thought to re ect neurobiological constraints imposed by anatomical connectivity and neural dynamics. Power laws have been described in fMRI, EEG/MEG, local eld potentials, and single unit activity [46][47][48] . Moreover, behavioral performance uctuations also follow a power law, including eye movements 35 , and tend to correlate with slow and fast neuronal activity. Interestingly, the power law exponents of behavior and neural activity are correlated across individuals both during task and rest 49 . Therefore, we posited that a similar link may occur between eye movement dynamics and neural dynamics, even spontaneously at rest (i.e., during blank screen viewing).
This implies that resting dynamics have an in uence on how we move the eyes during visual exploration, thus potentially revealing stable, biologically determined, traits of the observer 50 .
This was con rmed in our recordings of eye movements to a blank screen. We found in this case three components that explained a similar amount of variance (~ 50%) with the most variance explained by gaze step amplitude (gaze step length PC1: 29% variance), and the least variance explained by xation duration and frequency (PC3: 9% variance). Hence, the features de ning the three components resembled those found during visual exploration, but their relative weight differed. During exploration, eye movement variability was mainly explained by xation duration and frequency; during blank screen viewing, eye movement variability was mainly explained by the amplitude of gaze steps. This indicates that similar components are active in both situations, but that visual exploration gently moves the attractor space of eye movement parameters. This nding is in line with the similarity of brain activity topography at rest and during tasks 51,52 , with the relative correlation within and between networks adjusted during different tasks [52][53][54] . This is consistent with the idea that spontaneous neural dynamics function as a spatiotemporal prior constraining the parameters space of task-evoked activity 55,56 .
Our results are consistent with a previous small-scale study (n = 15) in which visual exploration eye movements were compared to eye movements recorded in darkness 9 . However, eye movements in darkness could re ect several factors not directly related to spontaneous visual exploration dynamics, such as posture-related information 57 or memory-related processing 58 . Also, pupillary responses are not controlled in the darkness. Other small-scale studies used a similar blank screen condition during a memory retrieval task 59 or while hearing sentences about a previously presented scene 60 . To the best of our knowledge, our work represents the rst large-scale study in which spontaneous eye movement dynamics are compared to those recorded during exploration of many real-world visual scenes, and the rst to show that characteristics of eye movements at rest (i.e., during blank screen viewing) can be used to classify different styles of visual exploration.
Regarding the present study's limitations, the sampling rate of the eye tracker (i.e., 120 Hz) did not allow us to investigate in detail the dynamics of microsaccades that are an important mechanism of xation. Visual exploration could be also studied in more natural conditions without the use of a chin-rest support using algorithms for head movements correction, or wearable eye-trackers. The blank screen viewing period of observation was short (30 seconds prior to the presentation of the rst image) so that we cannot rule out that some degree of expectation did in uence the results. Also, longer blank screen viewing periods would allow the detection of slower uctuations of eye movement patterns as well as pupillary responses that are related to vigilance uctuations and could signi cantly impact intrinsic activity 61 .
In conclusion, eye movement features during free visual exploration are correlated across subjects, and cluster people in two phenotypes depending on their style of exploration. The degree to which the distribution of gaze steps length resembled a power-law was the strongest predictor of the visual exploration style. We speculate that this could suggest the existence of neurological constraints that drive visual exploration behaviour and predict individual differences, e.g. patterns of anatomical connectivity and/or neural dynamics.
Another related implication of this work would be its potential as a biomarker in clinical populations. For instance, some authors have shown that neurodegenerative disorders are associated with speci c patterns of eye-movements features 62 , but these studies have mainly used laboratory tasks (e.g., antisaccades tasks), with some investigations during reading 63,64 , and not focused on intrinsic dynamics. It is possible that alterations of eye movement intrinsic patterns may represent an early biomarker of degeneration.

Subjects
A sample of 120 students were recruited at the University of Padova (mean age = 23.4, SD = 2.42; 49 M). All participants had normal or corrected-to-normal (i.e., glasses, N = 54) vision. We excluded individuals with excessive data loss, de ned as less than 50% of usable data in more than 25% of trials (n = 3 individuals excluded). Moreover, two further participants were excluded due to the interruption of the experimental session for a panic attack in one case, and for eyes irritation in the other case. Finally, one participant was excluded because of colour-blindness revealed after the experimental session was completed.
Thus, 114 out of 120 participants were included in the nal sample (mean age = 23.52, SD = 2.45, 67 F). All participants signed an informed consent before the experimental session and after it they received a remuneration of 10€ for their participation. The study was approved by the Ethical Committee of the University of Padova. Experimental Design Each participant took part to a single session composed by ve phases (total duration: 2 hours). The rst phase was called "blank screen viewing" as participants were asked to look at a grey screen without any stimulation for 30 seconds. Participants were just told to freely move their gaze within the screen boundaries.
In the second phase ("Free visual exploration") a set of 185 images of scenes selected from the Places 365 database (see the Stimuli paragraph for details about the dataset and the stimuli selection) were shown on the computer screen. Participants were instructed to freely look at the pictures in a self-paced design (for min 2,000 ms -max 10,000 ms; 1500 ms ITI) and to move to the next trial by pressing the spacebar. Moreover, they were informed that they would be asked some questions at the end of the task.
After the rst half of the images was presented, participants had a 10 minutes break to let them relax and rest their eyes.
Once all the pictures were presented, participants had another 5 minutes break before the third phase ("Recall") in which they were asked to recall the ve repeated images. Participants were requested to describe each image for 3 minutes as accurately as possible while their verbal description was recorded by means of a voice recorder. During the recall phase, participants were presented with the same grey screen adopted in phase 1. For the purpose of the present paper, only phases 1 and 2 have been considered.

Stimuli
The stimuli used in the present experiment were real-world scenes selected from the Places dataset 26 , a scenes dataset designed to train arti cial systems for image recognition. Speci cally, the dataset we used in this experiment is the validation set of the Places365-Standard dataset (the dataset can be downloaded here: http://places2.csail.mit.edu/download.html). All images in the dataset were categorized according to three hierarchical levels. Level 1 was the most general and subdivided the images in three categories: indoor, outdoor man-made, outdoor natural. In Level 2, each of the categories in Level 1 was split in four to six subcategories (e.g., for Level 1 category "indoor", Level 2 subcategories examples are "shopping and dining" and "home or hotel"). Finally, Level 3 encoded 365 speci c categories describing the type of scene (e.g., art gallery, bakery shop, etc.) For the purposes of the present work, only Level 1 categorization was chosen, moreover images were coded through an additional dimension, that is whether they depicted human beings or not. Thus, six categories were nally considered (i.e., indoor manmade with humans, indoor manmade without humans, outdoor manmade with humans, outdoor manmade without humans, outdoor natural with humans, outdoor natural without humans) and 30 images for each category were chosen (e.g., outdoor manmade with humans; Supplementary Fig. 2). The nal set of images was composed by 180 items with the add of 5 further images for the recall phase purpose. These images were taken from all the above-described categories but outdoor natural images without humans as this type of images showed a very low number of recallable details. Details about the image selection process are reported in Supplementary Fig. 1.

Assessment Of Behaviour And Personality
Participants were tested after the eye-tracker data acquisition was completed.
For the cognitive assessment we decided to focus on memory (visuospatial long-term memory, working memory) and executive functions (inhibition/impulsivity) as these domains seem to mainly in uence visual behaviour 65 .
The cognitive tests employed to assess the described domains were the Digit Span (forward and backward) 66 , the brief version of the Stroop Test 67 , and the Rey-Osterrieth Complex Figure (ROCF) 68 .
Moreover, we asked participants to ll a form sent by e-mail which included three questionnaires.
One of these was a personality questionnaire based on the Five Factor Model 69  None of the participants was discarded for excessive state anxiety score. Moreover, since some participants were students of psychology, we checked their knowledge of the administered tests using a three-point scale (0 = No knowledge; 1 = Theoretical knowledge; 2 = Theoretical and Practical knowledge).
No effects of previous knowledge emerged on the subsequent models.

Analysis
Eye-tracker data acquisition, pre-processing and features extraction.
The eye-tracker adopted was the Tobii T120 (Tobii Technologies, Danderyd, Sweden) which allows to acquire gaze data with a 120Hz sampling-rate (or every 8.3ms).
Participants were seated at a xed distance of 60 cm from the screen, and their head-movements were limited by a chin-rest.
Raw eye-tracking data were minimally pre-processed. We included in the analysis only gaze samples in which both eyes were assigned the highest validity value (i.e, validity code of 0, indicating that the eye is found and that the tracking quality is good). Then, we extracted a large set of features encoding various characteristics of eye-movements to describe visual behaviour in an exhaustive way, as done in other recent studies 14 .
For each participant, a set of 58 features was extracted (Supplementary Table 3) which encoded four main sources of information: 1. Fixations (e.g., mean duration of xations): statistics over xations are frequently employed in eye tracking studies 32 . In the present study, xations were detected using a velocity-based threshold algorithm 75 (detection threshold lambda = 15), which is considered adequate and robust across several testing conditions 76 . From a cognitive point of view, xations represent information processing and their duration is correlated with the depth of cognitive processing 77 .
2. Pupil diameter (e.g., mean pupil diameter of left eye) which is not only related to environmental light and vigilance, but also to a variety of cognitive processes such as attention 78 and cognitive load 79 .
3. Gaze steps (e.g., mean gaze step length, number of ips on x and y axes) in raw gaze data, i.e., the Euclidean pixel distance between two consecutive gaze positions. Notably, the use of this metric allows to avoid the distinction between saccades and microsaccades, as both types of eye movements are thought to be controlled by the same neuronal mechanisms 38 .
Moreover, for xations and gaze steps, some features were extracted which encoded their temporal course (e.g., mean xation duration in the rst, second, third and fourth quarter of exploration time).
Eye-movements data reduction.
A Principal Components Analysis (PCA) was performed to reduce the number of features to a smaller number of meaningful components. Oblique rotation was adopted because of the correlation between the features. To select the optimal number of components we adopted the Kaiser's criterion 80  For the blank screen viewing phase, a separate PCA analysis was done following the same procedure and the same features as before with the exception of exploration-time related features.
The reason of this is that in the blank screen viewing condition the exploration time was basically the same for all participants.
Since fourteen participants showed missing data in some xation-based features (e.g., due to a single central xation), only 100 participants were included in this analysis.  Fig. 9).
Interestingly, the most important features in blank screen viewing condition were mainly included in the third component extracted from the image-viewing task. This suggests that the importance of xationrelated features was lower if compared to the image-viewing condition, while more importance was assigned to pupil diameter and steps' length.
Finally, the same set of features were extracted also from the eye-movements data acquired during the blank screen viewing condition.
Detection of clusters in visual behaviour and their interpretation.
Preliminarily, Silhouette method 81 was applied to identify the optimal number of clusters in a data-driven manner, and suggested the existence of 2 clusters in our data. Then, a k-means cluster analysis with a k value of 2 was carried out. The reliability of the two clusters solution was tested by comparing different clustering solutions obtained from k-means and hierarchical clustering algorithms, using several distance metrics. The similarity between the clustering solutions was quanti ed by means of the Jaccard index ( Supplementary Fig. 3) and revealed that the 2 clusters solution was the most reliable across different methods. Figure 2 shows the participants scores in the three-dimensional space de ned by the rst three principal components, coloured according to the cluster participants belonged to. The PC1 scores accounted well for the differences between the two clusters which were represented by a continuum. Subsequently, we wanted to investigate whether the different visual exploration styles were associated with differences in the topography of the visual exploration pattern (i.e., entropy), in the distribution of gaze steps (i.e., more power-law-like) and in the informational content of xations (i.e., whether subjects paid more attention to saliency or semantic information).
First, for each participant, 185 heatmaps were created (i.e., one for each presented picture) representing the empirical gaze maps encoding the normalized number of times the gaze was centred in each pixel.
The Shannon entropy was calculated for each heatmap.
Second, the distance (i.e., euclidean distance) covered in each gaze step (i.e., "gaze step length") was calculated and the distribution of their length was computed. Then, the subject-speci c gaze step length distribution was tted to a power-law distribution and their similarity was quanti ed by means of the Kolmogorov-Smirnov test, a well-known nonparametric test which is used to distinguish between distributions 82 . Speci cally, in our case this test was used to investigate whether an empirical probability distribution (i.e., the subject-based distribution of gaze steps length) disagreed from a reference distribution (i.e., the power-law distribution), by quantifying the distance between these two distributions (Kolmogorov-Smirnov Distance, KSD). The lower the KSD, the higher the similarity between the empirical distribution and the reference power-law distribution. Importantly, this procedure was applied to each individual gaze steps distribution independently, leading to a different power-law exponent for each participant.
Third, we wanted to quantify the in uence of saliency and semantic information in driving visual exploration of real-world scenes. To this end, we created two types of heatmaps for each image: (1)  These maps were used to quantify, xation by xation, the quantity of saliency and semantic information included. We therefore calculated the mean amount of saliency and semantic information xated by each subject. Supplementary Fig. 6 shows a graphical explanation of this procedure. All computed heatmaps were spatially smoothed using a 2° full width at half maximum (FWHM) Gaussian Kernel.
A linear regression model was built with PC1 scores (obtained in the image-viewing task) as dependent variable, and the measures described above as predictors. The full model was tested on the whole sample, then its reliability and generalizability were tested by randomly splitting the sample in two halves, tting the model on one half (i.e., the training set) and testing its prediction (i.e., PC1 score) on the other half data (i.e., the test set). This procedure was repeated 1,000 times and each time the correlation between actual and predicted PC1 values was collected (Fig. 5B).
Then we built a new linear regression analysis with the aim to investigate whether visual exploration styles (PC1 scores) were predicted by demographic information (i.e. age, sex, education), cognitive (i.e., inhibition, visuospatial and verbal memory) or personality traits (i.e. Big Five scores). The full regression model (i.e., including all predictors; Supplementary Table 4) was tested and validated by applying the same procedure used before (i.e., split-half validation with 1,000 iterations; Fig. 5E).
Machine-learning classi cation analysis of cluster labels from blank screen viewing eye-movements' features.
We investigated whether the features extracted during blank screen viewing were informative about the visual exploration styles emerged while watching real-world scenes. To do so, we trained a Random Forest classi er to predict the two cluster labels (static vs dynamic, as determined in the image-viewing condition) from the blank screen viewing multivariate pattern of eye-movement features. We used a10fold cross-validation design, i.e. data were split into 10 folds, nine of which were used as training set and one was left out and used as test set. This procedure was repeated for 10 iterations until each fold was used once as test set, resulting in a mean accuracy value indicating the proportion of participants correctly labelled.
Moreover, we computed a features correlation matrix between subjects, thus testing the interindividual similarity in the pattern of eye-movement's features (Fig. 6B). As shown in the gure, the correlation is higher for participants falling within the same cluster (i.e., Static viewers or Dynamic viewers) than between participants with different visual exploration styles. Then, to test the reliability of this pattern of between-subjects similarity between blank screen viewing and image-viewing conditions, the Pearson's correlation between the two matrices was computed.