Human Mobility and Epidemic Evolution

The CoViD-19 pandemic ceased to be describable by a susceptible-infected-recovered (SIR) model when lockdowns were enforced. We introduce a theoretical framework to explain and predict changes in the reproduction number of SARS-CoV-2 (Sudden Acute Respiratory Syndrome Coronavirus 2) in terms of individual mobility and interpersonal proximity (alongside other epidemiological and environmental variables) during and after the lockdown period. We use an infection-age structured model described by a renewal equation. The model predicts the evolution of the reproduction number up to a week ahead of well-established estimates used in the literature. We show how lockdown policies, via reduction of proximity and mobility, reduce the impact of CoViD-19 and mitigate the risk of disease resurgence. We validate our theoretical framework using data from Google, Voxel51, Unacast, The CoViD-19 Mobility Data Network, and Analisi Distribuzione Aiuti.


Introduction
As the CoViD-19 epidemic persists, understanding the effectiveness of public service announcements and large-scale physical distancing interventions is critical for managing the short and long-term phases of spread of the epidemic. Many countries have reacted by imposing strategies based on mobility and physical lockdowns together with regional and international border restrictions. Many of these intervention policies are based on assessing the risk of an outbreak through compartmental disease models. We intend our model to be complementary to other well-assessed estimates of the reproduction number.
From a practical point of view, it is fundamental to understand which approach best permits one to forecast epidemic dynamics in the presence of incomplete data. This is especially true when a country's healthcare system is overwhelmed, and data collection becomes sporadic. Or, during the early phases of disease spread, when testing is incomplete or non-existent. For CoViD-19 there is the additional problem of undocumented cases.
In our analysis, we focus our attention on the contribution of asymptomatic or undiagnosed (and thus undocumented) individuals to the propagation of the contagion, assuming that these hidden infectious agents have the ability to spread the disease in an environment where susceptible agents are present and all the individuals have uniform mobility and physical proximity parameters. Consequently, we evaluate the impact of physical distancing policies in response to the CoViD-19 epidemic in Italy, the US, and selected European locations. First, we introduce the renewal equation approach to the evolution of epidemics to estimate the reproduction number of SARS-CoV-2 under an active containment strategy, and investigate implications for future risk. Later, we apply the estimate to the response of the epidemic to lockdown and estimate the effect of relaxing this policy alongside other factors. In conclusion, our approach concentrates on the fraction of people which are infectious but have not been detected, i.e., not reported as infected.
We interpret this approach in terms of a macroscopic collision theory of infected individuals in a region with a given susceptible population, taking into account the mobility of individuals as well as their radii of interaction as reliable proxies of physical distancing measures (as explained in the Methods section): Here, the tildes indicate that the variable is to be evaluated with a delay of τ, to account for the serial interval.R 0 represents the reproduction number calculated in a given period of time which also embodies the constant contribution over that time. Next, S t is the fraction of individuals that are susceptible. Finally, B t is the transmission rate function, which depends on average contact frequency, the virus's infectiousness, and the infectious age of individuals in the contagion process. In this way, it is a generalization of the interaction variable in compartmental models. The model of (1) is distinguished from traditional estimates by its forward rather than backward-looking estimation procedure. Imagine that some infectious individuals have not been detected and isolated. We wish to evaluate a measure of risk of exposure for a given susceptible individual. We take a kinetic approach to the evaluation of this risk. We imagine unobserved spreaders are free to infect other individuals and that the contagion acts within a certain radius of an infected individual. We imagine an environment in which two types of individuals are present at a calendar time t. n s is the density of susceptible individuals in a region, while j x is the density of active infected individuals there. By active infected individuals we mean individuals which are infectious but have not been detected and are free to move in a given region. We consider the regional mobility, µ, to be the average distance explored by each individual during the time interval, ∆t, (usually daily). We define the distance, r, to be the maximum distance that an infected person can be from a susceptible person (in the model) and still cause them to become infected. This distance depends, for example, on the virus' infectiousness as a function of distance and on the use of personal protective equipment, which can create a physical barrier so increasing effective distances. Physical distancing regulations, personal protection devices (such as mask wearing), and hygienic norms will result in a decrease in r, as also assessed in [1,2].
We associate the variable λ with contact tracing technologies which can be used to make λ closer to 1, as shown by [3]. The value of λ changes with infection age as well as t during the disease outbreak. These changes might depend, for example, on the ability to detect and isolate individuals, or the efficiency of contact tracing during the epidemic. Contact tracing efficiency varies with the characteristics of the infection and the speed and coverage of the tracing process.
Centralized manual testing and tracing may become an impractical strategy and a lockdown may become a more efficient and effective means of controlling an epidemic. However, lockdowns are not sustainable in the long term because of their social, economic, physical, and psychological effects. Lockdown policies have reduced the spread of CoViD-19, but as restrictions are relaxed transmission often goes up again.
Finally, the number of people at risk (susceptible individuals) is . There are various factors which contribute to the transmission of a disease. The biological and environmental properties are accounted for in the transmissivity variable η, as explained in the methods section. Physical proximity, viral load, and environmental conditions determine the infectious dose necessary to trigger the infection in a new host. For example, enclosed environments such as workplaces and schools correspond to higher η values in the model as compared to an outdoor space. A summary of all the variables in the model appears in Table 1. Analyzing samples to assess the current or past presence of SARS-CoV-2 viral (molecular and antigen) tests and antibody test. Identification of persons who may have come into contact with an infected person Now let us recall the actual (or effective) reproduction number which represents the average number of secondary infections 2/10 generated by each new infectious case (assuming n s and other environmental variables retain their current values forever). The actual reproduction number can be used as a predictive tool to track the epidemic's evolution. It is also a measure of epidemic risk, in the sense that if it is significantly above one for long enough, then an outbreak will occur. Thus, by linking a dynamical model with time-series data, one obtains a measure of epidemic risk. This risk is derived (see methods) leading to the effective reproduction number: where t 0 is an initial (or calibration) time, and we have taken the testing efficiency λ and the transmissibility η constant during the lockdown periods, as discussed in the Methods section.

Results
The above equation represents the change in the average number of secondary cases caused by a single primary case throughout the course of infection at calendar time t calibrated at an initial value (for example, before the lockdown). In the present section, we apply Eq. (4) to data from various sources in order to validate our modeling framework. We assume the spatial homogeneity of every variable. In particular, we consider the inverse of the radius of interaction to be the average distance, ρ(t) ∝ 1/r(t), between individuals, together with their average mobility, µ(t). Moreover we consider the fraction of missed cases, λ , to be spatially homogeneous and constant with respect to infection age. Moreover, we define a typical time interval necessary to detect an individual to be infected defined as the detection-age which is our time-scale for the evolution of the observed infected individuals. The detection time is typically at least as large as the latent period and can be thought of as equivalent to the incubation time plus the time needed to screen for the infection and isolate the infected individual. This is why we conservatively take the mean detection time to have the same value as the generation time and the serial interval of the contagion. Comparison between the reproduction number calculated from symptom onset data as in literature [4] (dashed red line) and the reproduction number computed according our kinetic approach, using data from [5] for mobility, [6] for social proximity and [7] for epidemic data. Ribbons are the 90% credible interval obtained via bootstrapping. Insets, solid black line is R(t) using physical distancing variables only, dashed black line is R(t) due to the depletion of susceptibles only.
The changing trends of the reproduction number may be due to several interrelated reasons apart from physical distancing policies. These reasons can be collected into two groups. The first has to do with the virus itself and its capacity to spread. Favorable environmental conditions or the emergence of less dangerous strains can decrease the effective infectiousness of the contagion. The other group of reasons is connected to the decrease in the susceptible population. On the other hand, physical distancing (also known as social distancing) is a practice recommended by public health officials to stop or slow down the spread of contagious diseases. It requires maintaining physical space between individuals who may spread certain infectious diseases. The data repositories used to obtain our results are listed in the Supplementary Information.
As proxies for mobility we consider both the changes in movement fluxes and percent change in average distance traveled, as released respectively in Google's mobility report [5] and Unacast's scoreboard [8]. We take the mobility to represent the average relative speed of the individuals with respect to each other. The fact that we use relative velocity is important, as it properly accounts for situations in which people move rapidly in a coordinated way.

3/10
We infer a measure of proximity from the active population density, i.e., the number of people per unit area moving about in selected locations, as variously reported by Voxel51's proximity index [6] and Unacast's human encounters [8]. In the singular Figure 2. Effective reproduction number for Italy during the lockdown period (March 9th to May 18th). We compare the traditional derivation with the analytical derivation from the method of this paper. Includes depletion of the susceptible population, individual mobility and physical proximity. Inset, lockdown only on mobility without change in physical proximity.
case of Italy, we take the number of face masks distributed to the population as a proxy for physical proximity (i.e., we assume the number distributed is effectively equivalent to a certain interpersonal distance).
In the general analysis of epidemic data we refer to reported infected persons by their dates of diagnosis via laboratory test. However, some countries also report infections by the date of first symptoms reported by patients. In particular, we have used the latter type of data when possible (Italy) and inferred it in the case of the USA and the UK via an analysis of the effective reproduction number assessed by [4,9,10].
We use epidemiological data at the level of states and mobility data at the level of cities for US locations and at the level of state for EU countries. We have studied and analyzed these regions during the period in which lockdown policies were in action as reported in [11]. Finally, for US states, we use [4] as estimation of the reproduction number as well as the estimation of susceptible population considered. When analyzing other countries, we use various sources, averaging Epiforecast [9] and Covid19 projections [10] so as to have an ensemble calculation of the actual reproduction number R(t).
In Fig. 1, we show the hardest hit states in the US as of June 2020: New York and Florida. Note the good agreement between the theory of this paper using mobility and proximity and independent estimates of the reproduction number. Note that for New York state an important cause for the reduction in R(t) is due to the depletion of the susceptible population, while physical distancing has a smaller impact. Meanwhile, in Florida, the behavior of R(t) is mainly due to physical distancing restrictions taken up at the end of the shelter-at-home policy.
The use of appropriate face coverings should reduce the transmission of CoViD-19 by individuals who do not have symptoms and may reinforce physical distancing. Public health officials also caution that face coverings may increase risk if users reduce their use of other efficacious measures such as physical distancing and frequent hand washing.
In Fig. 3, we show the two derivations of the effective reproduction number R(t). The first is found using an ensemble estimation and Epiforecast [9]. In figure 3a we further study the diffusion of the second wave of CoViD-19 in Italy, and in particular in one of the most hit regions, Lombardy, where we have used mobility and proximity from the Data for Good program [12] (see also the SI Appendix B for further discussions and results). We applied our analysis to the epidemic data through November 2020 in some US states using [4] as reference for R t . In Fig. 3b we show how our analysis closely matches the epidemic risk trend by using mobility data and new cases 15 days earlier than typical R(t) estimations in literature.
Alternatively we perform a further analysis comparing our social R t with the reproduction number computed by a direct renewal equation where we use the number of cases by the onset of symptoms, as in Fig. 4, data from [13] for Italy. In this figure we have used the renewal estimate of the reproduction number as in [14] and the social distancing based estimation using two dataset for human mobility trend from Google [5] and [12]. We notice that the anomalous peak observed in the epidemiological estimate in October is not present in the social estimate. This effect is largely due to an abrupt increase of number of performed tests in that period.  Finally, we call attention to the fact that mobility alone is not sufficient to explain the dynamics of epidemics, as discussed in [15]. We see that physical proximity is crucial in resolving why a relatively stable R(t) below 1 has persisted, despite an increase in mobility after the end of the lockdown period. On the other hand, one should subtract from the susceptible population the number of asymptomatic or undocumented infected individuals, which are not counted in official reports. We provide an estimate for this number in the Methods section.
The effects of vaccination on R t are difficult to estimate. In Fig. 4 we present a preliminary analysis of these effects. Taking

5/10
into account vaccination is necessary to accurately estimate the social reproduction number once a significant fraction of the population has been vaccinated. We assume that vaccination reduces the fraction of the population that is susceptible by a factor of 1 − ν(t), where ν(t) is the fraction of the population that has received the vaccine at time t. It is outside the scope of this paper to explore the effects of vaccination; however, we stress the importance of using this information to properly assess the reproduction number via mobility data. In a follow-up paper, the authors intend to analyze the effects of different types of vaccines on R t at the regional (and higher) level.

Discussion
The outbreak of the CoViD-19 pandemic has pushed many countries towards a response that relies on the policy of social distancing, the implementation of which has important social and economic impacts on the organization of production and on the work process. In response to the CoViD-19 pandemic, countries have introduced various levels of 'lockdown' to reduce the number of new infections. From Eq. (4) it is evident that as the epidemic evolves the force of infection is reduced for various reasons, primarily due to physical distancing policies adopted by most countries in the form of a lockdown of human mobility. Since it is not practical to reduce physical distancing beyond a certain socially and economically acceptable level, the only foreseeable reasons for the end of an epidemic are the depletion of susceptible population (immunization), a change in the intrinsic infectiousness of the virus, a sustained change in public hygiene habits (mask wearing, physical distancing, etc.), or innovation in contact tracing, testing, and isolation, see [16] for a discussion.
Mechanistic models of disease transmission are often used to forecast disease trajectories and likely disease burden, but are hampered by substantial uncertainty in disease epidemiology in the presence of significant social feedback. Models of disease transmission dynamics are hindered by uncertainty in the role of asymptomatic transmission, the length of the incubation period, the generation interval, and the contribution of different modes of transmission. Phenomenological models provide a starting point for estimation of key transmission parameters, such as the reproduction number, and forecasts of epidemic impact [17,18,19,20,21,22]. These models all have at least one state that can infect agents that are not currently infected with the disease.
Infectiousness depends on the frequency of contacts and on the level of infection within each individual. In airborne infections, the former can be decomposed as a product of mobility and physical proximity, interpreted broadly as an effective distance measure which also includes the amount and type of physical protection used. The latter involves an internal microscale competition between the virus and the immune system which depends on environmental factors like pollution levels and repeated viral exposure, which can modify the viral load shed by infectious individuals.
We have mainly focused our study on the spread of a contagion in a homogeneous population, using a renewal collisional equation which has proven to be a powerful tool for analyzing and modeling epidemic data along side other well established measurements of the reproduction number. We have found it to be both practically and conceptually useful. This analysis has focused on the lockdown, but the same theoretical tools along with additional technology and data resources show promise for the analysis of the post-lockdown response and further mitigation of this disease.
At this stage, we do not investigate the dynamics of the severity of the disease. In order to examine these dynamics, we would need to focus our attention on the microscale corresponding to viral particles and immune cells. Since these agents induce the dynamics of the varying intensities of the disease observed at the macroscopic scale of the human population.
Furthermore, to assess the severity of an epidemic in a population, one should take into account both the reproduction number R(t) and the absolute number of cases. A high R(t) is manageable in the very short run as long as there are not many people sick to begin with. An important aspect of R(t) is that it represents only an average across a region. This average can miss regional clusters of infection. Another subtlety not captured by R(t) is that many people never infect others, but a few 'superspreaders' pass on the disease many more times than average, perhaps because they mingle in crowded, indoor events where the virus spreads more easily. This means that bans on certain crowded indoor activities could have more benefit than blanket restrictions introduced whenever the R(t) value hits one. In conclusion, in addition to R(t) one should look at trends in numbers of new infections, deaths, hospital admissions, and cohort surveys to see how many people in a population currently have the disease, or have already had it.
Fatality rates and intensive care hospitalization rates are related to disease severity. In our collisional kinetic framework we have considered contacts among individuals to be random. In addition to these erratic contacts , one can consider structured contacts occurring at home, in hospitals, workplaces, and schools, just to mention a few of the possibilities. For structured contacts, we should consider the use of a different approach than collision theory.
One example of a situation in which interactions are more structured is in the theory of random growth of surfaces. In the model considered by [23], the growing surface is represented as a set of columns, which can be thought of as the individuals of a society that interact. These individuals influence each other and self-organize in the presence of noise so that anomalous scaling and long-range correlations are produced, which are a manifestation of the cooperation among individuals. Since people 6/10 interact in correlated ways [24], an extension of the collisional model of the present paper to include correlations among the movements of individuals would be more realistic (and likely important for small population sizes or parameter values near R = 1 or R = 0). For a simulation of the interplay between the social and epidemiological effects in a two-layer network, see [25].
We stress two advantages of using the social R t alongside the traditional estimations. First, the social estimate of R t is available a couple of weeks earlier then the epidemiological estimate. Second, if a deviation is observed between these two estimates, it may be a sign of a change in the transmissibility variable η that should be embedded in the model.

Methods
The renewal equation was introduced in the context of population dynamics studies. Later it was reinterpreted along the lines of stochastic processes, as in [26], where transmission occurred via a Poisson infection process. This process is such that the probability that, between time t and t + δt, someone infected a time τ ago successfully infects someone else is A(t, τ)δt , where δt is a very small time interval. As a consequence, the predicted mean infectious incidence at time t follows the so-called renewal equation: where τ is the generation time, which describes the duration from the onset of infectiousness in the primary case to the onset of infectiousness in a secondary case (infected by the primary case), and j(t) is the rate of production of infectious individuals. The kernel A(t, τ) is the average rate at which an individual infected τ time units earlier generates secondary cases. In other words, The kernel A can be factorized as A(t, τ) = n s (t)β (t, τ)Γ(t, τ) , where β (t, τ) is the product of the contact rate and the risk of infection (i.e., the effective contact rate), and Γ(t, τ) is the probability of being infectious at infection age τ. So, reduction in contact frequency with calendar time t affects β (t, τ) while early removal of infectious individuals at calendar time t changes the form of Γ(t, τ). An earlier average infection age at first transmission of the disease will result from contact tracing and isolation. However, the classic approach to renewal equations for epidemics assumes, as in [19,27,28,29], that the non-linearity of an epidemic is characterized by the depletion of susceptible individuals alone, so that A(t, τ) = n s (t)β (τ)Γ(τ) . Finally, the proportion of persons who have the ability to infect at a given calendar time is given by the number of infected individuals which is called prevalence, Notice that p(t) is not the number of active infected individuals generally reported in epidemic data published by different national health services. This is because the officially detected cases are actively confined (in hospitals or at home) and so their contribution to the spread of the epidemic is not so relevant. On the contrary p(t) represents the infected people that are still conducting their lives as usual, possibly infecting other people. The most important assumptions in our use of phenomenological models are (1) Short time scale of the epidemic (much shorter than the characteristic birth and death time scales of the population) (2) Well mixed population (force of infection homogeneously the same for all ages, sexes, etc.) (3) closed population (no immigration or emigration) (4) initial small shock (the initial infected group extremely small with respect to the size of the susceptible population).
Using the collision theory for chemical reactions in solution with two types of molecules, we can write down the rate of contacts between the two types in a given volume, per unit time z = n s j x 2πrµ. Where we have assumed that all agents are ideal point particles that do not interact directly, and travel through space in straight lines. We further investigate the assumption of such collision model in [30]. However, not all contacts will result in secondary infectious, rather only those contacts that have sufficient viral load so as to surmount a certain threshold for triggering the infection. Such transmission efficacy should depend 7/10 inversely on the physical distance between individuals. Moreover, the collision rate, in reality, depends on time and, in general, on the epidemic's evolution. This is because the total number of agents changes over time. As an approximation, we embed all of these complexities in the choice of the radius r, so to maintain the simplest form of crossection z.
Suppose that during an outbreak only a certain fraction of infectious persons are observed through direct testing, other infectious individuals are not observed, e.g., because of lack of symptoms or the mildness of their illness. In particular, asymptomatic secondary transmissions, caused by those who have been infected and have not developed symptoms yet, and also by those who have been infected and will not become symptomatic throughout the course of infection, must be considered. At a given calendar time, t, we imagine that the important new cases are not the observed newly infected (which are quarantined or self-isolate), but rather the fraction of newly infected that are not observed. Some of these unobserved infected spread the disease. The observed cases are a fraction λ t of all cases, i.e., , where λ t is the rate of detection which can change over time depending on the details of and degree of adherence to testing protocols and medical screenings. Moreover, the observed cases together with the undocumented cases constitute all cases so that . Thus, the relation between undocumented infected and documented infected individuals is . If the population screening procedure is effective, we have λ = 1. This could happen, for example, if the infected group is made up of only symptomatic persons which are infectious only after the onset of symptoms. As a first approximation, we have considered η to be constant over the time periods we considered, and λ to be a slowly changing function (over a time scale of τ A with respect to the calendar time t) so that λ (t − τ) ≈ λ (t). Finally, the actual (or effective) reproduction number can be written as the incidence-prevalence ratio , where prevalence is the proportion of persons who have the ability to infect at a given calendar time. This ratio indicates the propensity of currently infected individuals to infect susceptibles and is the average infectious period (or mean generation time). Therefore the actual reproduction number written as incidence persistence ratio is: where we have also considered some practical issues in the calculation of the reproduction number as given in [31]. Note that, R(t) does not depend explicitly on Γ(τ), except through its integral over all possible values of τ. Thus, to a an adequate degree of approximation, it only depends on the typical time between infection and detection. Indeed, one can replace the Γ distribution with any distribution with the same mean recovery time (i.e., time to become non-infectious). As a consequence, the most changing Γ (as a function of τ) can do is change R(t) by a re-scaling. However, effectively, the infectious age distribution depends on t. Since contact tracing, testing, and isolation (as well as treatments) will tend to reduce the active infectious period (and their use depends on t). On the other hand, the absolute scale of R(t) is also important, since one would like to maintain a value of R below 1.
Calibration. We discuss two important points for the calibration of the social distancing estimate of R t . Calibration is required since we need to align the social reproduction number we compute with the reproduction number derived using the epidemiological data obtained by traditional methods. Regressing the two variables, we find a constant which is then embedded in the theR 0 . The second point is essentially due to possible misalignment between the two different estimation procedures because of intrinsic discrepancies in data we use. For this purpose, we evaluate the multiplicative scaling factor in the reproduction number Eq. (4), using linear regression with an intercept coefficient fixed to be zero.
A second step of the calibration is estimation of the fraction of the population that is infected, which is particularly important for a longer-term analysis has. This is accomplished by studying the dependence of the reproductive number R(t) on the ratio, c(t), between the official number of people infected and the total population of the region (Italy) or state (US), [14]. The value λ has changed over time throughout the epidemic and after the end of the spring 2020 lockdowns it increased, possibly due to the increased number of tests performed.
Finally, when plotting the reproduction number, to visualize the trend, we use non-parametric regression analysis with LoWeSS (Locally Weighted Scatterplot Smoothing) surrounded by a 90% confidence interval obtained through bootstrapping.