Modeling the scaling properties of human mobility in virtual space

People are increasingly involved in online activities. Online activities can be regarded as movements in virtual space, such as jumping from webpage to webpage while surfing online, switching channels while watching TV, and browsing commodities while shopping online, which can affect information propagation, innovation spread, social activities, etc. Most previous efforts have been devoted to modeling the scaling properties of human mobility in physical space. Few studies aim to establish a unified and integral model to understand the fundamental dynamics underlying human virtual mobility. In this paper, we study human mobility in virtual space empirically and theoretically based on two datasets involving TV watching and online shopping and attempt to answer three unsolved issues. First, human virtual mobility shares common features, supported by the fact that striking agreements appear in the scaling properties of both datasets. Second, there exists a universal rule governing an individual’s choice in virtual mobility, which is distinct from that in the real world due to travel restrictions. Third, there exists a unified model incorporating the behavior rule unique to virtual space under the framework of Exploration and Preferential Return, which can be used to reproduce the scaling properties of virtual mobility. We reveal the mechanism behind human virtual mobility through consistent scaling properties and develop a corresponding dynamic model based on empirical data.


Introduction
In the real world, individuals constantly travel from one location to another, forming trajectories. Uncovering the statistical patterns that characterize these trajectories [1][2][3][4] is of importance for disease control [5,6], congestion alleviation [7,8], information propagation [9,10], etc. The key to understanding human mobility dynamics is to establish models to reproduce the scaling properties of empirical data such as the waiting time distribution P( t), the distance distribution P( r ), and the visitation frequency f r , where t denotes the time spent by an individual at a location, r is distances covered by an individual between consecutive sightings, and f r is the frequency f of the r th most visited location [11]. Moreover, the model must be self-consistent, which requires that it can reflect the behavior rules of travel microscopically and can be unified with kinds of scaling properties macroscopically. A representative work is the Exploration and Preferential Return (EPR) model proposed in 2010 [12]. Two generic mechanisms, exploration and return, governing human trajectories are introduced to account for the empirically observed scaling exponents. In many subsequent studies, rules within the EPR mechanism were added or adjusted to make predictions more consistent with empirical data. The representative models include the radiation EPR model [13,14], memorypreferential random walk model [15], and EPR model conditional on current location [16]. These quantifiable human mobility models have evolved from general random walk models [17] to models based on travel mechanisms or memory effects, etc., which match better with reality [12,18]. The involved fields have also expanded from human travel to urban mobility [19], income segregation [20] and even mobility in virtual space [21].
With the rapid advances in information and communication technologies (ICT), people are increasingly involved in online activities. Online activities can be regarded as virtual movements [21,24]. Examples include continuous browsing from commercial websites in a single online session, continuous channel switching on a digital TV, and consecutive posting on social networking sites. Facilitated by massive online datasets, an increasing number of studies are devoted to revealing the intrinsic patterns underlying human virtual mobility. Quantities concerned in the virtual space are similar to those in the physical world, including the waiting time distribution P( t) ( t is the time spent by an individual at a site), the visitation frequency f r (the frequency f of the r the most visited site), and the number of distinct sites S(t) (the number of sites visited). From the real world to virtual space, some studies have explored the characteristics of human virtual mobility. Firstly, the visitation frequency f r of human virtual mobility was found to follow Zipf's law, and an EPR model regardless of waiting time t was constructed to reproduce f r 's scaling properties [21]. Then, f r was further confirmed to follow Zip's law, and the number of distinct sites S(t) is challenged not to be power law like in the real world. Moreover, a model integrating the preference attachment mechanism and EPR framework was established to interpret the exponents of these quantities [22]. Further, the scaling prop-erties of real travel and virtual movements were compared. It was found that the waiting time distribution P( t) of human virtual mobility is heavy-tailed and a model was proposed to link the critical exponents characterizing the spatial dependencies in human mobility and social networks [23]. Recently, there have been discussions on the scaling properties of virtual movement at different scales. Reddit posting data proves that f r , S(t), and P( t) are power-law at the community level [24], while cross-app usage data indicates the powerlaw properties of f r and the non-power-law properties of S(t) [25].
However, there is still a lack of an empirical, unified, and integral dynamic model for human virtual mobility, which is manifested in a series of unsolved issues. First, most studies only involved scaling properties of one specific online activity, without comparing exponents across different datasets [24]. Unlike the empirical exponents of real-world travel, scaling properties have been proven highly uniform in dollar-bill tracking [1], mobile-phone data [26][27][28], taxi data [29], etc. Second, the relationship between the waiting time t, the number of distinct sites S(t), and the visitation frequency f r are not fully discussed. Some models ignore the waiting time t and take the number of movements n as the only key parameter to derive the scaling exponents [21], and some other models only explain the characteristics of visitation frequency f r [22,25]. Third, mobility dynamics are mostly described as a memory-based random walk process [30][31][32][33]. Various assessments and quantifications of the memory effect are based on their assumptions and are not verified in empirical data [22,24]. Therefore, three questions regarding human mobility in virtual space naturally arise. Does human mobility in virtual space share common features? Does online behavior data contain generic rules? Does there exist a unified model that can reproduce the properties of the waiting time distribution P( t), the number of distinct sites S(t), and the visitation frequency f r ? In this paper, we discuss human virtual mobility based on two typical datasets, from phenomena to rule and further to model, to answer the three issues raised. changes in virtual locations, that is, online activities, are called virtual mobility [21,24]. In the past few years, network technology has made it possible to track humans' online footprints. Phone records, online shopping records, web browsing records, and other datasets have provided a new momentum to study human mobility in virtual space. Although these datasets vary greatly in terms of their fields and sources, the results seem to agree on a number of quantitative characteristics of human mobility [34].
In this paper, we use two datasets from different domains to uncover the scaling properties characterizing individual mobility. Dataset D1 contains TV viewing records of 30,000 anonymous users in a large city in China from July 2015 to September 2015. Dataset D2 is obtained from a large multicategory online store capturing 60,000 anonymous users' click records in October 2019. In D1, a user's switch from one channel to another is considered a movement in virtual space. In D2, a movement between commodities consists of two consecutive browsing activities (see Fig. 1). Unlike travel in the real world, travel in virtual space is instantaneous [35]. The memory effect in real travel can last for months or years, but viewing/clicking effects after one day or even one hour are almost independent of previous ones. If a user's 3-month viewing records are regarded as one trajectory, users' footprints are too random to truly characterize scaling properties. Therefore, we segment the sequence for each user and take continuous records as one trajectory [21]. A user may have multiple trajectories in the observation window (see Supplementary Section S2).

Empirical scaling properties
Previous studies have discovered a series of scaling properties of virtual mobility in online data. For example, mobile phone data show that the waiting time distribution P( t) is heavy-tailed, that is, P( t)∝ t −1−β [23]. Taobao shopping records and Reddit records both indicate that the frequency f of a user visiting a given location and its rank r follow Zipf's law: f r ∝r −ζ [22,24]. These characteristics seem to imply that mobility in virtual space can be explained by models for the real world, especially the EPR model [12]. However, some studies have questioned that the number of distinct sites S in virtual space is different from S(t)∝t μ in the real world. Some researchers proposed S(t)∝ A B+t μ , while others believed that S(n)∝n μ , where n is the number of steps. Another significant statistical characteristic, P new , the probability of exploring new locations, is also controversial and is assumed to follow: P new ∝S −γ , P new ∝ A B+t or P new ∝n −γ [21,22], etc. We comprehensively explore the scaling properties using two datasets involving TV watching and online shopping. First, we measure the waiting time distribution P( t). In the TV viewing record dataset (D1), t is the viewing time of the audience on a TV channel, calculated as the time difference between the start time and end time of a record. In the online shopping dataset (D2), t is the browsing time for a product, approximately the time interval between two consecutive browsing records. We find that P( t)∝ t −1−β with β 1 = 0.78±0.11, β 2 = 0.88±0.08 (see Fig. 2(a)), which is consistent with the form in the real world, where β = 0.8±0.1 [12]. Then, we discuss the number of distinct sites S(t), finding that S(t)∝t μ , where μ 1 = 0.52±0.007, μ 2 = 0.57±0.011 (see Fig. 2 In the real world, μ = 0.6±0.02 [12]. It is worth noting that β =μ, indicating that travel in virtual space does not follow a continuous-time random walk (CTRW) [31]. Note that there is a slowdown in exploring new locations at large timescales because μ < 1, suggesting a decreasing tendency to visit unvisited locations. Another quantity closely related to S is P new , the probability of exploring a new location. P new deviates from the expectation: P new ∝S −γ (see Fig. 2c), suggesting that it may be inappropriate to directly apply the classical EPR model to model movements in virtual space. Finally, we plot the visitation frequency f r and rank r , which can be written as f r ∝r −ζ , ζ 1 = ζ 2 = 0.94±0.02 (see Fig. 2d). The visitation pattern of virtual mobility is uneven, but this unevenness is weaker than that in the real world, where ζ = 1.2±0.1.
Naturally, we can draw two conclusions. First, the empirical characteristics or exponents in virtual space are not the same as those in physical space, suggesting that the mechanism behind travel in virtual space differs from that in reality. Second, there are similarities between the scaling properties of the two datasets. We notice that similar P new and ζ 1 = ζ 2 . Although time-related exponents β 1 =β 2 , μ 1 =μ 2 , μ = 2 3 β hold for both. This suggests that the scaling properties are determined by both the unified behavior mechanism and the system features.

Human mobility model in virtual space
Before introducing our model, we first emphasize a recognized premise that the waiting time distribution P( t) in virtual space is heavy-tailed, as addressed by previous research [23]. However, there is still a lack of a unified interpretation to derive other scaling properties: S(t), P new , and f r . With our framework, individual human trajectories are characterized using two generic mechanisms: exploration (visiting a new location) and preferential return (visiting an already visited location). At each time step t, the individual explores a new site or returns to a previously visited site. The probability of exploration is generally assumed to follow a power law as P new = ρ S −γ [12]. For the probability P for an individual moving to a previously visited site, there are many configurations such as P r = f r ∝ 1 r , P r ∝ 1+λk i 1+λk n [12,22]. However, the above forms are only assumptions. The forms of P new and P r in virtual mobility have not been verified.
Particularly, in our model, the specific rules governing the return are summarized from the empirical data. We name it the Virtual Exploration and Preferential Return (VEPR) model. We first verify the relationship between P r and r . It is worth noting that P r ∝log 10 1 r , which is different from P r ∝ 1 r in the real world (see Fig.  3a). This discrepancy indicates that the memory effect of return in virtual space is weaker. In the real world, considering time and distance, travel costs are relatively high, so the memory effect is particularly strong. People often return to a few locations, such as home and company, manifested as P r ∝ 1 r . However, there are no such restrictions in virtual space, where almost every movement is free and instant. Even infrequently visited sites have a certain probability of return traffic, shown as P r ∝log 10 1 r (see Fig. 3b). As shown in Fig. 3(a), P r ∝log 10 1 r . We use the ordinary least square (OLS) method to obtain the coefficient k of the equation P r ≈klog 10 1 r + c. However, the coefficient k varies with S, the number of distinct locations visited previously (see Fig. 5a). This means that P r ∝log 10 1 r is still not the most concise form, which is also not conducive to deriving scaling exponents theoretically. Therefore, we need to rescale the parameters in the regression equation. A surprising finding is that k and 1 S are almost perfectly linear (see Fig. 4a). Therefore, we change the equation to P r ∝ 1 S log 10 1 r for regression. There is no significant correlation between the new coefficient k and S. For the intercept c , we find that it is proportional to log 10 S, that is, c ≈c log 10 S (see Fig. 4b). Another surprising finding is k ≈c (see Fig. 4c). Now, we have which can be written as The OLS results of Eq. (2) show that there is a linear P r and 1 S log 10 S r at a 95% confidence level (see S4 for detailed OLS results). λ and σ under different S, obtained by Eq. (2), obey normal distribution with λ 1 = λ 2 = 1.44±0.03 (see Fig. 4e), σ 1 = 0.0026±0.0003, σ 2 = 0.0040±0.0023 (see Fig. 4f) and are uncorrelated with S (see Fig. 4d). λ 1 = λ 2 illustrates that the rules governing peoples' return in human virtual mobility are common. σ 1 and σ 2 are close to 0, consistent with the previous approximation that k ≈c . In conclusion, the probability of returning to the r th location is P r ≈ λ S log 10 S r +σ (see S4 for the detailed process of rescaling). After rescaling, all P r and 1 S log 10 S r fall near the straight line with a slope of 1.44 (see Fig. 5b).
We develop our VEPR model according to Eq. (2). A typical step for a person browsing in virtual space is schematically illustrated in Fig. 6. An individual can perform one of the two complementary processes at each step: return or exploration. If the individual (a) (b) Fig. 3 The rules governing preferential return summarized from empirical data. a Verification of the regression forms of P r and r . We plot P r versus r , P r versus 1 r , and P r versus log 10 chooses to return to a previous location, the visitation probability P r correlates with rank r , expressed as Eq. (2). Alternatively, he visits a new location with the probability P new = 1 − P ret = 1 − P r . In what follows, we show that the individual mobility model incorporating preferential return and exploration is sufficient to explain a series of scaling properties.

Theoretical analysis
Our VEPR model has two parameters, λ and σ , which control the user's probability of returning to a previously visited location and determine the tendency of the user to explore a new location. We aim to obtain the analytic forms of the three characterizing quantities mentioned above. from P( t)∝ t −1−β , the user can either return to a previously visited location with probability P r ≈ λ S log 10 S r + σ or visit a new location chosen from locations not visited with probability The probability of exploring a new location P new . Obviously, P new = 1 − P ret . P ret is the sum of probabilities of returning to the previously visited locations. In general, we find (see Supplementary S4) The number of distinct sites S(t). We note that the probability that an individual moves to a new location is approximately 4 3 where n is the number of discrete moves the individual made up to time t. Given the heavytailed waiting time distribution P( t)∝ t −1−β , time t scales with the number of moves n as t∝n 1 β [12], indicating that S(t) follows S(t)∝t μ with Since β 1 = 0.78, β 2 = 0.88, we have μ 1 = 0.52, μ 2 = 0.57, identical to the exponents of empirical data.
The visitation frequency f r . We introduce an intermediate quantity m r , the number of visits to the r th location. Note that at each step, the probability that m r increases is the probability that the r th site is returned, that is, dm r dn = P r , where P r , as in Eq. (2). Because P r discussed here is for one site and σ is a small quantity close to 0, we further approximate Its solution is m r (S)≈ 3λ 2ln10 (S 1 2 ln S r −2S Eq. (6) predicts ζ = 3λ 2ln10 , so ζ 1 = ζ 2 = 3×1.44 2ln10 = 0.94, which is in excellent agreement with the empirical value.
Note that the exponents β of the waiting time P( t) are different in the two datasets. The exponents μ of S(t), related to t, are also different. However, μ = 2 3 β holds for both datasets. Moreover, the scaling properties of P new and f r are identical. It reveals that for different virtual movements, the selection mechanism behind each individual is the same, but due to different waiting time in viewing, browsing, etc., β and μ vary with scenes. However, the underpinning is unified.

Numerical simulation
We numerically simulate our VEPR model to verify the scaling laws of three characterizing quantities. Note that the waiting time distribution P( t) is determined by the specific system, and we take it as a premise and set β = 0.78 according to the TV watching data. We use the setting of 20000 trajectories in a space of 300 sites, that is, N = 20000 and M = 300. λ is the most critical parameter in our model, and we set λ = 1.44 for all S according to the empirical results.
The results, which are shown in Fig. 7, indicate that the three characterizing quantities are consistent with the empirical results. For further quantitive analysis, we conduct the independent sample T test for each pair of empirical values and simulated values of S(t), P new , and f r respectively based on the 30,000 viewing records and 20,000 simulation trajectories. An empirical value is colored green if the P value is less than 0.05 and is red otherwise. More than 80% of point pairs pass the T test, proving our simulation's effectiveness (see S6 for detailed T test results). Moreover, we note that there is a slight deviation between the simulation and the empirical results when S/r is small or large. When S/r is small, only a few points can be used for regression. When S/r is large, few people watch/browse many sites in one trajectory, so the sample size is small. In both cases, the parameter λ is biased from λ (also reflected in Fig. 4d). Therefore, for small or large S/r , the simulation results obtained by substituting λ for the true λ are biased. In addition, the number of distinct sites S visited by individuals is much smaller than M = 300, showing that each step of each individual is based on the principles of our model, without overflow, which also conforms to the virtual space that has almost no space size restriction. Thus far, our VEPR model has been validated in simulation and theory.

Discussion
Movements in virtual space have attracted increasing academic attention. We explore virtual mobility dynamics based on two online datasets of different fields. We find striking consistency in the two virtual movements by measuring the scaling properties and summarize a universal choice mechanism behind virtual travel from empirical data. We also develop an individual mobility model called the Virtual Exploration and Preferential Return (VEPR) model. Compared with previous studies, we comprehensively discuss the scaling characteristics across datasets and find a more con- Our results can be summarized as follows. (i) We find that the scaling properties of virtual mobility are jointly determined by the unified mechanism and system features. The characteristics of the probability of exploring a new location, P new , are the same for TV watching and online shopping. The exponents of visitation frequency f r are also the same. The equation μ = 2 3 β holds for both datasets, where β is the waiting time distribution's exponent and μ is the exponent of the number of distinct locations S(t). (ii) We summarize a specific rule that governs individuals' behavior from the empirical data. In virtual space, the memory effect is P r ∝log 10 1 r . Compared with P r ∝ 1 r in the real world, the visitation probability is more uniform, which is in agreement with the smaller travel restriction in virtual space. (iii) We obtain a more general choice mechanism by rescaling the coefficients to be independent of S. The equation is P r ≈ λ S log 10 S r + σ , which can reflect the return rule and facilitate theoretical derivation. (iv) We incorporate the rule under the EPR framework to establish our VEPR model. The three scaling properties can be reproduced in theory and simulation, which proves the model's validity. In conclusion, we extract the rule of virtual travel from empirical data, and our individual mobility model based on the rule reproduces the scaling properties in theory and simulation. At the same time, the virtual mobility's features can also be well reflected by the mechanism of our model. The same macro scaling properties and micro behavior mechanism behind watching TV and online shopping inspire further exploration of whether there is a unified underpinning behind other virtual movements such as web browsing, online communication, and social interactions. Additionally, our work also contributes to solving practical problems. It is feasible to establish the mobility model for each individual according to his/her λ to predict his/her digital trajectory, which can be applied to make personalized recommendation in practice.
While we treat all users' trajectories equally for regression to obtain the rule governing return, users are heterogeneous in behavior. The model incorporating memory heterogeneity may reproduce much richer statistical properties. In addition, it is worthwhile to obtain each user's λ by regression to understand his mobility style. For example, a large λ indicates that the individual prefers to return to previous locations, while a small λ represents the fact that he is an explorer. This key parameter has potential applications for current important issues such as traffic optimization and online recommendation. Generally, it is of profound significance to extend the model considering heterogeneity at both individual and group levels. Data availability 1. Dataset D1: This anonymized dataset represents 3 months of viewing records from 30 thousand users in 2015. The dataset cannot be disclosed for confidentiality reasons. 2. Dataset D2: This dataset contains behavior data for 60 thousand users from a large multicategory online store in October 2019. The dataset is free and available at https://www.kaggle. com/datasets/mkechinov/ecommerce-behavior-data-from-multi -category-store.