Emergence of urban growth patterns from human mobility behavior

Cities grow in a bottom-up manner, leading to fractal-like urban morphologies characterized by scaling laws. The correlated percolation model has succeeded in modeling urban geometries by imposing strong spatial correlations; however, the origin of the underlying mechanisms behind spatially correlated urban growth remains largely unknown. Our understanding of human movements has recently been revolutionized thanks to the increasing availability of large-scale human mobility data. This paper introduces a computational urban growth model that captures spatially correlated urban growth with a micro-foundation in human mobility behavior. We compare the proposed model with three empirical datasets, discovering that strong social interactions and long-term memory effects in human movements are two fundamental principles responsible for fractal-like urban morphology, along with the three important laws of urban growth. Our model connects the empirical findings in urban growth patterns and human mobility behavior. The study shows that a memory-aware and socially coupled human movement model can reproduce urban growth patterns at the macro level, providing a bottom-up approach to understand urban growth and to reveal its connection to human mobility behavior.

O ver a century ago, urban scientists envisioned that the city of tomorrow would have symmetric and regular morphologies as a result of tight top-down city planning rules that aim to optimize the accessibility of city function and space use in city development 1,2 .However, the increasingly available urban data suggest that cities grow in a bottom-up manner, calling for understandings of its micro-foundation [2][3][4] .Three fundamental empirical laws were later discovered 2,5,6 : first, the size distribution of the city follows a scaling law with an exponent of around -2.0, which implies that large cities are much rarer than small towns 5 .Second, urban population grows super-linearly with its area due to the intense competition for spaces 7,8 .Finally, the density of occupied urban areas decreases exponentially with the radial distance (r) to city centers [9][10][11] .A computational physics model (that is, diffusion-limited aggregation) was applied to model urban growth as an aggregation of physical particles 7,12 .Further works showed that correlated percolation is a better alternative to explain the emergence of the aforementioned laws 5 .A key finding of the correlated percolation model is the requirement for strong spatial correlation in urban growth to reproduce the scaling relations 13 .As a result, researchers proposed to model city growth as two processes: the emergence of new clusters and the growth of existing clusters 14 .A more recent work proposed a diffusion-limited gravitation model that combines diffusion-limited aggregation with the stochastic gravitation model 15 to capture spatial correlation; the model assigned the aggregating particles a stopping probability when it is closed to the developed urban sites 16 .
Although these models successfully explain urban morphology, they all model city growth as the stochastic processes of physical particles, which leaves the micro-foundation that roots in human behavior largely unknown.Here we develop a computational urban growth model based on human movements; the model suggests that strong social interaction and long-term memory effects in human movements are two fundamental principles governing urban growth.

Results
Human movement models.Our understanding of human movements has been revolutionized over the past decade thanks to the availability of large-scale human mobility datasets 17,18 .Existing human movement models can be classified into a paradigm of four categories based on the mechanisms of social interaction and long-term memory (see Fig. 1a).Specifically, a human mobility model is considered socially interactive if one individual's movement affects the others; a human mobility model is considered memory-aware if an individual's future mobility behavior is affected by his/her historic movements.Category-1 models treat human movements as randomly moving particles without interactions.Brownian movement is one of such models where an individual's displacements are normally distributed 19 .Unlike physical particles, empirical data suggest human movements are characterized by a fat-tailed jump-size distribution, satisfying a power law, where P( − → v | − → v 0 ) is the transition probability from location − → v 0 to − → v , and d = 2 for two-dimensional space 17,20 .The exponent α is observed at around 0.55 ± 0.05 (ref. 21).The fact that the transition probability decreases with distance characterizes the cost of travel distance of human movements, that is, most of the time people travel only over short distances, whereas occasionally people take longer trips.Neglecting social interactions and long-term memory effects, equation (1) suggests human movements follow the Lévyflight model, in which the evolution of population density ρ( − → v , t) follows the fractional diffusion equation, where (−Δ) α/2 is a fractional Laplacian and D is a diffusion constant (see the Methods for details).Nevertheless, both the Brownian movement and Lévy-flight models predict a uniform population distribution when time t approaches infinity, in contrast to empirical observations 9 .
where ρ 0 is the (inverse) coupling constant.In addition to the fat-tailed jump-size distribution equation (1), the Gravity model equation (3) also requires that the transition probability increases linearly with the population at the destination − → v (ref. 23).This mechanism accounts for a mean-field background attractiveness rooted in social interaction, for example, highly populated locations often offer more social opportunities 8 .The social attractiveness provides a potential mechanism to reproduce the spatial correlation in the correlated percolation model.Unfortunately, we find that the diffusion process of the Gravity model follows the same fractional diffusion equation (2) for the Lévy-flight model (see the Methods for details), that is, it predicts a uniformly distributed urban population at a large value of t.More recently, research explored the correlation between human mobility and social network, finding that human movements are largely determined by underlying social ties [24][25][26] ; however, these models often require inputs of social network data measured from experiments, leaving the origin of these correlations unexplained.Unlike Category-1 and Category-2, in which individuals move freely, empirical data found notable recurrent visitation patterns in human movements 21 .To explain these observations, the individual mobility model (IMM) re-treats human movements as a two-stage return-exploration process to account for long-term memory effects 21 .In particular, a preferential return mechanism is imposed, that is, the probability of returning to a previous location − → v i , which is proportional to its historic visitation frequency, f( − → v i ).
Such a long-term memory return process slows human diffusion drastically.In particular, IMM predicts that the typical traveling distance l t follows a logarithmic growth as where A is the total number of visited locations 21,27 .Logarithmic growth is a key feature of human movements, characterizing the anomalous ultra-slow diffusion and home range effect 28 .More recently, the d-EPR model generalizes the IMM by introducing a background field where an individual explores a new location with a probability proportional to its population density 29,30 .Although IMM and d-EPR successfully capture individual movements on a daily basis, individuals move independently and do not interact with each other.Moreover, the background field in the d-EPR model is static and does not evolve with time.These models therefore cannot capture the dynamic interactions among individuals and, consequently, are not capable of reproducing urban growth patterns (see Supplementary Section 1 for more details).
Neither social interactions (Category-2) nor long-term memory effects (Category-3) alone reproduces the spatial correlation proposed in the correlated percolation model; thus, a natural question is whether the remaining Category-4 models (mobility models that integrate social interaction and long-term memory effects) can predict urban growth patterns.Although Category-4 models are largely unexplored, there are few exceptions.A recently developed GeoSim model integrates the memory-aware IMM with a social network where each individual is more likely to explore a new location visited by their friends 31 ; however, the social network is predetermined from the inputs of empirical data and does not evolve with time.It is therefore not able to track urban growth on a long timescale.
The collective mobility model.To integrate long-term memory effects with dynamic interactions, we propose the collective mobility model (CMM), which offers a minimal set of principals connecting human movements to urban growth (see Fig. 1b).We consider that individuals move according to a return-exploration process similar to IMM.Specifically, the probability that an individual decides to return to visited locations can be computed as Time Explore: Return: Fig. where S is the number of previously visited locations, δ controls the initial return probability and γ controls how the return probability changes with S.During the return, the probability to choose a specific visited location is proportional to his/her previous visitation frequency to that location, that is, equation ( 4).The individual also has a complementary probability P exp = δS −γ to explore a new location.During exploration, the individual will choose previously unvisited locations on the basis of distance and population, satisfying the gravity law of equation (3).The population ρ( − → v ) in equation (3) is calculated instantaneously on the basis of all individuals's locations at time t.The coupling constant, ρ −1 0 , can be considered as a smoothing parameter that controls the strength of population attraction, that is, increasing ρ −1 0 reduces the probability of exploring unpopulated locations and, consequently, increases the strength of social interactions.
Inspired by the strong spatial correlation in correlated percolation models 5 , we are interested in a CMM with the strong coupling limit ρ −1 0 → ∞, describing a unique human mobility model with both dynamic social interaction and long-term memory effects, which has not been explored by previous research.As a result, the proposed CMM can simulate urban growth to arbitrary timescales in a self-organized manner and does not rely on any external input.It is also worth pointing out that IMM can be considered as a special case of CMM, with ρ −1 0 → 0.
Experimental settings.We collect three public available urban development datasets, including the population and urban area of cities in (1)  Urban morphology.To compare the morphologies of the simulated urban systems with empirical observations, we plot the obtained population distributions in Fig. 2a-d, together with the empirical distributions of the city of London in Fig. 2e.Although real-world geometry is affected by geographical features such as lakes and rivers, the city of London still exhibits prominent features of the compact city center and fractal perimeters.These observations echo previous studies on the fractal geometry of urban area [32][33][34] .
The urban population distributions for the Lévy-flight and Gravity models follow the fractional diffusion process (equation (2)), implying that individuals will gradually diffuse away from their initial position over time.The simulation results verify the theoretical prediction that these two models cannot reproduce a compact and stable city center, as the simulated urban population is distributed uniformly in the urban space when the systems converge (see Fig. 2a,b).On the other hand, IMM predicts that urban systems grow homogeneously in the perimeter.The simulation result shows the perimeter of the urban area is a standard circle, and the urban areas that have a similar radial distance to the city center have a similar population density (see Fig. 2c), which is consistent with the theoretical prediction and shows IMM cannot reproduce the fractal morphology of the urban area.By contrast, CMM successfully reproduces the compact city center in an urban system, where the population density is substantially higher than the peripheral urban area (see Fig. 2d).Besides, the perimeter of the city center demonstrates prominent fractal geometry and numerous subclusters are formed around it (see Supplementary Section 4 for details).These observations are in agreement with the empirical observation of the city of London, which indicates CMM can reproduce the morphology of the urban area.We also evaluate the proposed CMM on predicting the future morphology change when given the initial urban area distribution.Experiments show CMM considerably outperforms baselines in terms of the reproduced urban morphology and the prediction accuracy on whether a specific location will be developed in the future (see Supplementary Section 3 for more details).The population distribution of each urban system is visualized as a heatmap on a logarithmic scale, where blue and red colors represent underpopulated and high-population regions, respectively.The urban systems are simulated with 30,000 individuals initially situated in urban centers, which is defined as the central site of simulation space.The individuals are then simulated to move according to the mobility models until the population distribution reaches a stable state.
Urban growth patterns.To examine the model's capability to reproduce urban growth patterns, we focus on three fundamental empirical laws, each of which has been validated on multiple cities around the globe 2,5,6 .
(i) City size distribution: the number of cities N(A) decreases with their areas A, following a scaling law, where the exponent τ has been reported 5 to be around 2.0.Specifically, the scaling law of city size is equivalent to Zipf 's law, with an exponent of τ − 1 (ref. 35).The scaling law with an exponent of 2.0 is therefore consistent with the previous observations on Zipf 's law of city size distribution 35 .Percolation theory is the prevalent framework to characterize this observation, which can generate a probability that a certain location is occupied as an urban area 13 .It predicts the scaling law of equation ( 6) with exponents ranging between 2.0 and 2.5, where τ = 2 corresponds to a strong correlation between different locations and τ = 2.5 corresponds to a mean-field theory 13 .The empirical data shows that city size distributions are well approximated by equation ( 6) (see Fig. 3a), with τ = 1.94 ± 0.11, 2.01 ± 0.08 and 2.08 ± 0.18 for the US, the GB and Berlin, respectively.These findings echo the theoretical predictions of site percolation theory and empirical observations in the previous research 5,13 .
The Lévy-flight model characterizes movement as an individual diffusion process.The urban population distributes uniformly in the urban space as the urban system reaches to a stationary state.Similarly, although the Gravity model introduces social interactions among individuals, the population density satisfies fractional diffusion equation (2).When time t → ∞, the population will distribute uniformly in the urban space ρ(r) = c, which is independent of the coupling constant ρ −1 0 .The Lévy-flight and Gravity models are therefore expected to generate the scaling law with an exponent of around 2.5 in an uncorrelated percolation 13 .To test this hypothesis, we identify the isolated connecting components in the simulation as cities and measure the area of each city 5 .Figure 3d shows that both the Lévy-flight and Gravity models reproduce a power-law city size distribution with τ = 2.55 ± 0.15 and τ = 2.58 ± 0.17, which is consistent with the theoretical prediction.The simulation also shows IMM satisfies the scaling law with τ = 2.98 ± 0.51.This large exponent implies the fact that individuals are localized within their own home range, as the IMM is equivalent to the non-interactive limit of CMM with ρ −1 0 → 0. By contrast, when the coupling constant ρ −1 0 → ∞, the CMM becomes strongly correlated.As a result, it reproduces the scaling law with τ = 2.02 ± 0.13, which agrees with the theoretical predictions and empirical patterns observed in realworld data.All reported exponents range within a 95% confidence interval based on the Theil-Sen estimator 36 (see Supplementary Section 6 for details); however, we find that the proposed CMM is the only model that contains the empirically observed universal exponent of 2.0 in its 95% confidence interval.These results suggest combining the principals of social interaction and long-term memory effects in human mobility can reproduce the empirical city size distribution, which is successfully achieved by CMM.
(ii) Super-linear relation between population and city size: the positive allometric population growth with city size is widely observed in cities around the globe 37,38 .Larger cities tend to have a higher urban population density defined as, ρ A ≡ N(A)/A.Researchers believe that is because the competition for space is more intense in big cities and there are more high-rise buildings to handle denser populations 32,39,40 .Recent research suggested that the balance between the cost and gain of concentrating population in urban areas would explain the observed super-linear growth 6 .This socio-economic hypothesis consists of two assumptions: (1) the average gain from the intense social interaction is proportional to the population density ρ A , (2) the average living cost is proportional to the typical travel distance l t to explore the city.Their balance leads to ρ A ≈ l t (ref. 6).Substituting equation ( 5), we have The first assumption agrees with the social interaction in equation (3), whereas the second is rooted in the long-term memory effect in equation ( 4). Figure 3b plots ρ A with A across different cities in both US and GB, finding that the empirical observation agrees precisely with the predicted logarithmic law of equation (7).It is worth noting that previous studies reported most cities follow scaling laws (that is, ρ A ≈ A δ ) with tiny exponents δ ≈ 0.04 ~ 0.3 (ref. 6); however, within the range of magnitude of the empirical data, the logarithmic relation is indistinguishable with a small-exponent scaling law.The logarithmic relation we identified in equation ( 7) therefore provides an alternative explanation to the origins of population growth patterns, which is derived from human mobility behavior.Figure 3e compares the simulation results for the four models, finding that the proposed CMM accurately reproduces the logarithmic law; however, the simulation results of the other three models are fitted by lines with a coefficient of zero, which indicates that they predict ρ A to be independent from the city size.This result demonstrates our model can outperform baselines in reproducing the empirically observed super-linear population growth.
(iii) Exponential occupation profile: the urban occupation profile ϕ(r) is defined as the probability of finding an populated area at the distance r from the city center, where an exponential profile is observed in previous study 41 , where λ is the exponent of the exponential occupation profile.Empirical measurements of the exponential urban occupation profiles have been made in several past works 5,41 .Figure 3c rechecks this pattern on all three empirical datasets, demonstrating that the city center attracts most of the population, whereas the occupation probability decreases rapidly with r.However, such a rapid decline somehow contradicts the fattailed nature of equation (1), which suggests humans are able to reach areas far away from the initial location 20,32 .Indeed, simulation results in Fig. 3f show that the Lévy-flight and Gravity models fail to reproduce a city center and predict ϕ(r) do not change with with different r.The lack of social interaction makes IMM unable to produce large subclusters on the peripheral area of city, which results in dramatic occupation density drop at the edge of city area.On the contrary, by jointly modelling the social interaction and long-term memory effects, the occupation profile reproduced by CMM agrees well with the exponential law of equation (8).
Moreover, it has been suggested that the declining rate λ shall decrease as the city evolves due to the constantly expanding frontiers of cities 5 , which is in line with the observations in the Berlin dataset where ϕ(r) has been measured at three different time.Figure 4a shows that λ decreases gradually from 0.050 to 0.031.The simulation results of CMM precisely reproduce the evolution of the occupation profile during urban development (see Fig. 4b).Specifically, as more population transit from the initially located city center to the peripheral area, the peripheral area will become more attractive for incoming movement based on the social interaction mechanism.On the other hand, the long-term memory mechanism will maintain a dense area in the city center.The combination of social interaction and long-term memory mechanisms will therefore lead to the gradually flattened occupation profile; such a combination demonstrated that CMM can simulate the gradually increased attractiveness of the peripheral area in a self-organizing manner.We also measure the time-varying occupation profiles of Lévy-flight, Gravity and IMM, finding that they do not capture the time evolution of ϕ(r) (see Supplementary Section 7 for details).Moreover, we find the more sophisticated d-EPR model with a static social interaction network 29,30 also fails to reproduce the occupation profile evolution (see Supplementary Section 1 for details).These results demonstrate that the dynamic interactions in CMM are critical for reproducing the dynamics of the urban growth.

Impact of simulation parameters.
There are three important parameters that might influence the simulation of urban systems: α, ρ 0 and the average population density ω; α is the main parameter that characterizes the principal of cost of travel distance, which controls the distance distribution of each displacement.It is set as 0.55 in previous simulation based on empirical observations 21 .ρ 0 mainly controls the principal of social interaction, it determines the likelihood of individuals exploring more populated regions, and was set to ρ −1 0 → ∞ in past simulations to reproduce the strongly interactive urban systems; ω is defined as the ratio between the number of citizens and the number of sites, that is, ω = M/l 2 .It represents the number of social interactions one would encounter per unit space.We now examine how these parameters may influence the characteristics of the proposed CMM.
The morphology of simulated urban areas and reproduced urban growth patterns with different parameter settings are present in Figs. 5 and 6, respectively.First, as shown in Fig. 5a,b, we observe that ω does not notably affect the fractal morphology of urban systems.Furthermore, as shown in Fig. 6a-c, the city size distributions, population growth patterns and urban occupation profiles all agree with the empirical observations under different ω settings.These results indicate the urban growth patterns are robust with different ω.Second, α has substantial impact on the compactness of simulated urban systems, which is probably because it controls the jump-size distribution of each exploration in equation (1).When α is smaller than the empirical value, the exploration will have larger jump-size; thus, as shown in Fig. 5c, the occupied areas of urban systems are more diverged and distributed further away from the city centers.
By contrast, Fig. 5d shows when α is larger than the empirical value the simulated urban system is more cohesive, and larger subclusters are produced.Moreover, Fig. 6d-f shows city size distributions, population growth patterns and urban occupation profiles all deviate from the empirical rules when α is different from the empirical value.These results show that α plays an important role in both urban morphology and urban growth patterns; thus, α is not only an important parameter in human mobility behavior, but also fundamentally governs urban growth.Third, ρ 0 largely affects the morphology of urban areas.Figure 5e shows that the simulated urban systems show prominent fractal morphology when ρ −1 0 → ∞ as ρ 0 = 10 −6 ; however, when we increase ρ 0 to 1 in Fig. 5f, the fractal urban morphology is not reproduced and it shows homogeneous growth on the perimeter of the urban area.Furthermore, Fig. 6g shows that the exponents of city size distribution gradually increase when ρ 0 increases from 0 to 1.As shown in Fig. 6h,i, the population growth patterns and urban occupation profiles also deviate from empirical rules when ρ 0 increases to 1.These results indicate CMM can accurately reproduces the empirical rules of urban growth when ρ −1 0 → ∞, which corresponding to a singular mobility model that has both strong social interaction and long-term memory effect.On the contrary, the CMM predictions gradually converge to IMM when ρ −1 0 → 0, which is consistent with our theoretical analysis that IMM is a special case of CMM.These results show the parameters α and ρ 0 in CMM have important impact on the compactness of urban system and the fractal morphology, respectively.We therefore set α = 0.55 and ρ −1 0 → ∞ in our CMM to follow the empirical observations and reproduce the strong social interactions.

Discussion
This paper seeks to establish a connection between urban growth and human mobility.It is worth pointing out that although most human mobility models are developed on the basis of the active mobility behavior that characterizes daily movements, urban growth is more closely related to the residential mobility, that is, the relocation of residence.Although residential and active mobility are human movements on different timescales, they share common features in behavior patterns.For instance, the Gravity model is often used in both scenarios 42 .Moreover, the long-term memory effect is equally important for relocation due to numerous socioeconomic factors such as the effect of civic memory and change of workplaces 43,44 .The CMM therefore provides a framework for both active and residential mobility, each corresponding to different experiment set-ups.For example, the relocation has a much smaller transition rate-that is, probability of movement per unit timethan the daily movements.Nevertheless, we do not set a specific timescale in the simulation and find that the universal urban laws observed in the proposed CMM are largely independent of choosing different modeling parameters (see Fig. 6).The application of simulations on different timescales will be investigated in future research.Besides, our study shows long-term memory and social interaction mechanisms are a minimum set of common principles in human mobility that governs the empirically observed universal laws of urban growth.On the other hand, the d-EPR model that leverages static social interaction network also cannot reproduce the urban growth, which shows the limitation of physical determinism models in capturing the self-organizing nature of urban system.By contrast, the proposed CMM is not solely physical deterministic, yet a highly simplified decision-making model where individuals choose their exploration locations based on the dynamic social attractiveness of the locations (see Supplementary Section 5 for more details).Together with the long-term memory mechanism for the return processes, CMM is a human mobility model with a minimal set of decision-making processes that allows theoretical analysis and numeric simulations.We feel that the simplicity of the proposed model makes it a succinct and general framework for urban growth simulation.
Our study has direct implications on several downstream applications 45,46 , including city planning, resource allocation, policy making and so on.Top-down city governing has been shown ineffective in foreseeing urban growth and optimizing urban functions by the previous research 2,47 .Predictive experiments show the proposed CMM outperform baseline models in predicting the morphology change of urban areas (see Supplementary Section 3 for details), which can help city planning and resource allocation to stay ahead of urban development and reduce inefficient investments.The proposed model provides a bottom-up perspective to understand how urban growth emerges from human mobility behavior.It allows the policy-makers to make sense of the driving force of human movements behind urban growth, which can inform the design of policies that optimize urban functions, for example, improving the The present work can be improved from the following directions, which are left for future research.First, the proposed CMM provides a theoretical framework that connects urban growth and human movement; however, the detailed empirical mechanisms that contribute to this connection are not investigated in current research, which might require quantitative and qualitative analysis on the empirical data collected from heterogeneous sources.Second, the proposed CMM is computation intensive due its strongly interactive nature.We propose two sampling methods to improve the efficiency of simulation (see Methods for details); however, further optimization is needed to simulate large-scale urban systems with millions of citizens.Third, the proposed CMM is general and can be further optimized for specific application domain by including relevant contextual features.For example, although we demonstrate the predictive power of our model, it can be further augmented by explicitly considering the contextual features such as lakes, rivers and road networks.methods Urban growth simulation.We simulate urban growth with M individual citizens moving according to the mobility models and considered the occupied sites as developed urban areas.Specifically, urban growth simulation is performed on a two-dimensional square simulation space with l sites in each dimension for T time steps.The simulated citizens are initially located at the central site (l/2, l/2) of simulation space, and then iteratively sample their next sites according to the mobility models at each time step.After the system is converged, we considered the populated sites as an urban area, and examine the morphology and growth patterns of the simulated urban system.Following the settings in previous work 5 , we identify the connected components of urban area as cities and group them in logarithmically spaced bins.The patterns of city size distribution and urban population growth are reported by averaging over the cities in each bin with sliding windows.As for the urban occupation profile, we use sliding windows gradually moving from city centers to frontier to capture the gradual change of urban occupation density.We perform statistical analysis to test the robustness of simulated patterns (see Supplementary Section 6 for more details).
Specifically, we set l = 300, M = 30, 000, T = 20, 000 in the main simulation experiment.The simulation size is typical for a strongly interactive agent-based model 48 , where the computational complexity is considerably high.In each step, it requires to sample the next site of each individual from the entire universe of sites based on the probability distribution derived from human mobility models.The overall time complexity of the memoryless mobility model-that is, Lévy-flight and Gravity models-is therefore O(TMA).As for the memory-aware mobility model, that is, IMM and CMM, we define θ E and θ R as the average probability of exploration and preferential return, respectively, where θ E + θ R = 1, and p as the average number of previously visited sites; the overall time complexity is then O(TM(θEA + θR p)).Although the simulation can be accelerated with parallel processing as the sampling at each step is independent among individuals, the computation complexity is still not scalable to simulate a large urban system.For example, it requires over 690 h to simulate a l = 300, M = 30, 000, T = 20, 000 urban system on a computer with 40 cores of 3.6 GHz Intel i7 processors.We therefore propose two improved sampling methods-alias sampling and sorted array sampling-to improve the efficiency of the simulation.
The alias sampling method aims to optimize the exploration mechanism that is important in all four mobility models, which characterizes how citizens travel to new sites.It is especially time-consuming as in each step it requires sample a site based on a certain probability distribution from the entire universe of sites for each citizen.As a result, we design an improved sampling method by exploiting the insight that the probability distributions for the citizens in the same sites are identical in each step, which allows us to reuse the probability distribution among the citizens in the same sites.To achieve this goal, we implement an alias sampling technique to draw subsequent samples from the same distribution in O(1) complexity 49 (see Supplementary Fig. 2).Denote the sampling space as a ∈ {1, 2, ... , A}, the probability of sampling a as p[a].The alias sampling algorithm is described as two subprocesses: creating an alias table and generate subsequent samples.In the first subprocess, the algorithm works by reshaping the probability distribution over A sites into A pairs of sites {b i | for i ∈ {1, 2, ... , A}}, and the summed up probability of each pair of sites is identical, namely, A −1 .This subprocess is also called the Robin Hood method as it requires to break down the sites with probabilities p[a] > A −1 and pair them with the sites with probabilities p[a] < = A −1 .In the second subprocess, the algorithm first samples a pair of sites from {b i | for i ∈ {1, 2, ... , A}} based on uniform probability distribution.Then, sample a site from the site pair based on its corresponding probability.It is straightforward to prove that each sample generated by the second subprocess follows the exact probability distribution defined by p[a].Note that the time complexity of creating alias table in the first subprocess is O(A); however, the alias table can be reused to generate subsequent samples from the same probability distribution at O(1) complexity.Therefore, the amortized time complexity per draw is O(1) when the number of samples is large.In our simulations, a large number of citizens share the same probability distributions since they are in the same sites.Therefore, the alias sampling technique is able to effectively reduce the complexity of exploration mechanism from O(TMA) to approximately O(TM), which renders the time complexity independent with the size of site universe and substantially speeds up the simulation.
The sorted array sampling method aims to improve the efficiency of simulating preferential return mechanism in IMM and CMM.The probability of preferential return is positively correlated with the number of sites individuals previously visited S. Specifically, we can implement the preferential return mechanism by first sampling an integer from a uniform distribution on the number of total visits of an individual, [0, ∑ i f(v i )].Then, we cumulatively summed up the frequency of each site and select the site to render the sums first above the sampled integer.Therefore, the time complexity of each preferential return is proportional to S since we need to go through all visited sites, and it will be computation expensive when citizens have visited a large number of sites.An important observation is that the probability distribution is skewed among all the sites visited by individuals since the probability is proportional to the visitation frequency associated with each site.Therefore, an intuitive optimization solution is to sort the array of previously visited sites by the frequency, and first examine the site with highest frequency since they are most likely to be sampled.However, sorting an array with S elements requires a time complexity of O(Slog 2 S), which is more expensive than the complexity of sampling O(S).To address this problem, we leverage the insight that the orders of the array will not change notably between two neighboring steps.Therefore, we implement a sorted array data structure to enable the reuse of the array of sorted sites among neighboring steps.Specifically, the data structure contains the pairs of sites and the associated visitation frequency.If the citizens perform a preferential return, we update their sorted arrays by increasing the frequency of sampled sites and moving up the corresponding pairs until the arrays become sorted again.If the citizens perform an exploration, we add a new pair of sites and frequency to the bottom of the array.It is straightforward to prove that the average time complexity of maintaining a sorted array data structure is less than O(log 2 S), which effectively improved the efficiency of simulating preferential return mechanism.
These optimization techniques allow us to reduce the computational time of simulating a l = 300, M = 30, 000, T = 20, 000 urban system from 690 h to 12 h on a workstation with 40 cores of 3.6 GHz Intel i7 processor, which makes it feasible to simulate large-scale urban systems.
Diffusion equation of gravity model.We prove the fractional diffusion equation (2) for evolution of the density ρ( − → v , t) in the Gravity model.We start with a master equation in a site, and then derive the continuous equation by taking the proper limit when the site spacing h approaches zero.The transition rate matrix W ij defines the probability rate of individuals departing from site i and arriving in site j.With equation (3), we have where λ g (h) is a normalization factor that shall scale appropriately with the site spacing h, and α ∈ (0, 2].The corresponding master equation reads, Assuming a small site spacing h, we approximate the summation by a continuous integration, where (−Δ) α/2 is the fractional Laplacian satisfying (−Δ) α/2 f( and c d,α = π d/2 |Γ(−α/2)| 2 α Γ((d+α)/2) .Taking the continuous limit h → 0 requires the existence of lim h→0 λg(h)h α , that is, g(h) ~ h −α for small h.Substituting equation (11) by equation (10) leads to equation (2), where the diffusion constant D ≡ c d,α ρ 0 lim h→0 λg(h)h α .
For α = 2 we recover the standard diffusion equation.Therefore, as long as the diffusion constant D is positive, the simulated urban population will diffuse to eventually follow a uniform distribution.Otherwise, negative D corresponds to a unusual negative diffusion process where urban area cannot exist stably and will sink to a single point.
Data collection and calibration.All datasets used in this paper are publicly available.Specifically, the data of the urban area and population in the US was NatuRE ComputatioNaL SCiENCE | www.nature.com/natcomputscioriginally released by the US Census Bureau for the year 2000 50 .It contains the information of urban size and population in the granularity of neighborhoods or towns.Besides, the data of cities in Great Britain was originally released by the Statistical Office of the European Union for the year 1991 51 .It provides information on the city's coverage and population distribution in the resolution of discrete 200-meter squares.We use the preprocessed datasets released by previous research 52 .The dataset of the Berlin region is extracted from the telemetry images of settlement area distributions in 1910, 1920 and 1945 published by previous work (see Fig. 4.1 in ref. 53 ).We follow the procedures proposed by previous researchers 5,54 to digitize the empirical data into discrete sites.The telemetry images provide information on whether a site belong to urban area, but not the urban population distribution.

Statistics and reproduciblility.
We use two-sided Wald test with t-distribution 55 to examine the null hypothesis that human movement models reproduce urban growth patterns with zero exponents.The detailed statistical analysis results can be found in Supplementary Table 2.
As for the city size distribution, we find all four models reject the null hypothesis with statistical significance: Lévy-flight model (P-value < 0.05, n = 17), Gravity model (P-value < 0.05, n = 18), IMM (P-value < 0.05, n = 11) and CMM (P-value < 0.05, n = 16).Moreover, the theoretical analysis and empirical observations both indicate the exponent should be two due to the strongly correlated urban development 5 .To test this pattern, we use the Theil-Sen estimator to compute the confidence intervals of the exponent reproduced by each model 36 .As a result, the 95% confidence intervals for the Lévy-flight model, Gravity model, IMM and CMM are [2.40,2.70], [2.41, 2.75], [2.47, 3.49] and [1.89, 2.15], respectively.Only the proposed CMM contains the empirical observed two exponents in its 95% confidence interval.
As for the urban occupation profile, we find that the Lévy-flight (P-value = 1.0, n = 25) and Gravity (P-value = 1.0, n = 25) models cannot reject the null hypothesis of 0 exponent, whereas IMM (P-value < 0.05, n = 22) and CMM (P-value < 0.05, n = 25) reject the null hypothesis with statistical significance.
Otherwise, the model predicts the individual to return to a previously visited location r i based on the following equation, where f (r i ) is the frequency of visiting location r i in historic mobility behavior.On the basis of Otherwise, he/she returns to a previously visited location r i based on the preferential return mechanism described as follows, where f (r i ) is the frequency of visiting location r i in historic mobility behavior.α, δ and γ are empirical parameters that set as 0.55, 0.1 and 0.21 to follow the setting of IMM.Besides, we set ρ 0 → 0 in the simulation to consider the scenario of strong social interaction.On the contrary, IMM is a special case of CMM with ρ 0 → ∞.
location of each citizen by relocating them to the closet developed locations shown in 1920 data.
After that, we continue to simulate the urban system until the number of occupied locations reach 15758 as the ground truth in year 1945.
The simulated urban areas of Levy-flight model, Gravity model, IMM and our proposed CMM are presented in Fig. 3B-E, respectively.Specifically, we can observe that our CMM can reproduce dense and cohesive urban areas in the city center and medium size sub-clusters in the peripheral area, which is consistent with the empirical observations in Berlin region.On the contrary, the baseline models predict the urban population to distribute more loosely in simulation space, and generate large number of small and isolated sub-clusters in the peripheral area, which are unrealistic in urban development.Moreover, our CMM also reproduces the fractal morphology that is observed in Berlin region.
Besides, we also quantify the predictive power as the accuracy rate of predicting whether a location will be developed, which is computed by comparing against the ground turth in 1945.
Our proposed CMM has an accuracy of 82.9%, which substantially outperforms the 70.4% of IMM and 68.3% of Gravity model and 68.1% of Levy-flight model.Although our model is not optimized for fine-grained city growth prediction, these results combined to suggest our model has superior predictive power compared to the baselines.The substantial performance gain in city growth prediction can translate into notable social economic benefit in downstream applications, such as city planning and resource allocation, which are left for future research.
random walk models (e.g.Brownian motion and Lévy-flight model) and models solely depend on prior trajectories (e.g.IMM and d-EPR model).Our simulation results also confirmed that these individual-based physical determinism models fail to capture the urban growth.
On the contrary, the proposed CMM explicitly characterizes the dynamic social interaction among urban citizens on high-level, where individuals make decisions about their explorations based on the social attractiveness of the location.Therefore, the proposed CMM is not solely physical deterministic, yet a highly-simplified agent-based model that integrates two key ingredients, i.e., a simplest version of decision making process based on strong social interactions among agents, together with the memory effect for individual return processes.Compared to existing agent-based models that often consists of a large number of microscopic details (e.g., context of current locations and urban environments), CMM is an agent-based model with a minimal set of decision making processes that allows theoretical analysis and numeric simulations for a long-run.
The general framework described by CMM can be easily extended to include more detailed decision making process by considering the fine-grained demographic features and behavioral patterns of urban citizens, which is left as future research.
On the other hand, heterogeneous behavior is another important phenomenon in urban mobility 4 .Although CMM predicts individuals to follow the same rules, we find that the model generates dynamically inhomogeneous movement patterns due to the memory effect.To conclude, these results show the proposed CMM outperforms three baseline models in reproducing city size distribution, super-linear population growth with city size and urban occupation profile with statistical significance.
Simulation Size To evaluate the impact of simulated urban size, we simulate urban systems with l = 100, l = 300 and l = 500, respectively.That is, we simulate the systems on spaces with 100×100, 300×300 and 500×500 locations.Besides, we keep the population density the same as previous simulations, i.e., M/l 2 = 1/3.The obtained results of city size distribution, super-linear

Fig. 2 |
Fig. 2 | the morphologies of urban area generated by four different human movement models and in real-world city.a-e, Data generated by the Lévyflight model (a), Gravity model (b), IMM (c) and CMM (d), as well as empirical data from the city of London (e).The population distribution of each urban system is visualized as a heatmap on a logarithmic scale, where blue and red colors represent underpopulated and high-population regions, respectively.The urban systems are simulated with 30,000 individuals initially situated in urban centers, which is defined as the central site of simulation space.The individuals are then simulated to move according to the mobility models until the population distribution reaches a stable state.

Fig. 4 |
Fig. 4 | the time evolution of urban occupations.a,b, The urban occupation profile ϕ(r) as a function of r for Berlin (a) and CMM (b).Straight lines represent exponential fittings for the corresponding empirical data and simulation results.

1 Fig. 6 |
Fig. 6 | the comparison of reproducing urban growth patterns with different parameter settings.The standard parameter setting is ω = 1/3, α = 0.55, ρ 0 = 0. We change the value of one parameter in each panel.a-c, Reproduced urban growth patterns with varying ω. d-f, Reproduced urban growth patterns with varying α. g-i, Reproduced urban growth patterns with varying ρ 0 .
Lévy-flight model, IMM further models the principal of memory in human mobility by leveraging the preferential return mechanism.Specifically, α, δ and γ are three empirical parameters that are set as 0.55, 0.1 and 0.21 based on the previous research 4 .d-EPR model One closely related recent work proposed a computational urban mobility model, i.e., d-EPR model, to introduce a social dimension into the memory-aware IMM 6 .Specifically, the d-EPR model calculates a location relevance score based on the total number of phone calls originated in each location C(r 0 + ∆r).Then, on the basis of competing returning and exploring mechanisms in IMM, d-EPR model further predicts individuals will have higher probability to explore location with high relevance score:

Fig. 5
shows the trajectories of three randomly-selected individuals from the simulation, where nodes represent the five most visited locations with their size proportional to the visitation frequencies.We find As for the super-linear urban population growth, Table2shows all four models can accurately reproduce the hypothesized function with the R 2 close to 1.However, the p − value of Lévy-flight model, Gravity model and IMM are 1, which means they cannot reject the null hypothesis of zero exponent.On the contrary, CMM can reject the null hypothesis with statistical significance (p−value < 0.05).In fact, Lévy-flight model, Gravity model and IMM predict the urban population density to be invariant with city size, while CMM reproduces the super-linear growth of urban population.As for the urban occupation profile, we can observe that Lévy-flight model and Gravity model cannot reject the null hypothesis of zero exponent, while IMM and CMM can reject it with statistical significance (p − value < 0.05).However, the coefficient of determination R 2 of IMM is only 0.69, indicating it cannot adequately reproduce the negative exponential function.In contrast, the proposed CMM can simultaneously reproduce the negative exponential function (R 2 = 0.99) and non-zero exponent (p − value < 0.05).

1 | the paradigm of human movement models and illustration of the proposed collective mobility model. a,
Existing human movement models can be classified into four categories based on whether they account for social interactions and the memory of historic movements: the Brownian movement and Lévy-flight models belong to the first category, where movements are socially independent and memoryless.Gravity and Radiation models are typical models of the second category, where movements are socially interactive and memoryless.IMM and d-EPR belong to the third category, where movements are socially independent and memory-aware.The fourth category is both memory-aware and social interactive, including the GeoSim model and the proposed CMM.
b, An illustration of the proposed CMM.At time t 1 , an individual is located at site v 0 (green node) and have previously visited S = 4 sites (left panel).Here, a larger node size represents a larger population size at that site, whereas a more intense node color represents a higher past visitation frequency.With Pexp = δS −γ probability, the individual will explore a new site (white circle) based on equation (3).Otherwise, with P ret = 1 − δS −γ probability, the individual will return to a previously visited site (orange circle) based on equation (4).NatuRE ComputatioNaL SCiENCE | www.nature.com/natcomputsci There are subtle but critical distinctions d-EPR model and the proposed CMM.Specifically, while d-EPR model successfully captures individual movements on a daily basis, there are two limitations for modeling urban growth: 1) individuals move independently and do not interact with each other; 2) the background field in d-EPR model is static and does not evolve with time.In con-scaling law.Instead, the number of cities N in d-EPR model decreases with the city area A more rapidly for smaller cities yet more slowly for larger cities.This finding implies that introducing many-body interactions is one of the key ingredients to capture the self-organization nature, and consequently the universal scaling laws of urban growth.2) Dynamic evolution: Unlike a static background field used in d-EPR model, the dynamic interactions in CMM predict urban growth as a time-evolving process.To quantify this difference between d-EPR model and CMM, we measure the occupation profile φ(r) as a function of the radial distance r for both models.Figure1B shows φ(r) at different times, finding that the occupation profile is independent of time for d-EPR model, whereas gradually flattens as the city grows for CMM, in line with the empirical observations (see Fig.4in the main text).Besides, the d-EPR model requires a huge amount of inputting parameters measuring from the number of phone calls at each location, while the proposed CMM is self-contained and nearly parameter free (with only 4 parameters).We believe that this nearly parameter-free nature is one of the strengths of the proposed approach, as it demonstrates the universality of urban growth.Collective Mobility Model Simulation and theoretical analysis show that previous human mobility models are inadequate to predict the morphology and dynamics of urban growth.A key observation is that all three principals are vital to reproduce urban growth patterns.Based on these considerations, we propose a Collective Mobility Model (CMM) to self-consistently model all the key principals in human mobility, namely social interaction, memory and cost of travel distance.Specifically, we leverage the preferential exploration mechanism and preferential return mechanism to characterize these factors and unify them under a competing framework.At the next step t + 1, individual decides to explore a new place with probability P exp = δS −γ or return to a previous visited location with probability P ret = 1 − δS −γ , where S is the number of distinct location previously visited by him/her.If the individual decide to explore a new place, he/she makes a displacement ∆r based on the preferential exploration mechanism described as follows, Note that for applications in a short-time scale such as daily movements we may approximate the dynamic interactions in CMM by a static mean field, where CMM and d-EPR model become similar to each other.However, for applications in a large time scale such as urban growth, the difference between the static mean field (d-EPR model) and dynamic interactions (CMM) is not anymore negligible, making d-EPR model fail to capture the long-term universality of urban growth.