Use of mobile phone sensing data to estimate residence and mobility times in urban patches during the COVID-19 epidemic: The case of the 2020 outbreak in Hermosillo, Mexico

,


Introduction
It is increasingly evident that we are witnessing a continual rise in global and regional connectivity.Understanding mobility patterns is crucial to developing more realistic mathematical models to comprehend various phenomena such as economic trends, patterns of violence, information dissemination, and the spread of infectious diseases [25,41,49].To practically utilize these more realistic mathematical models, we need to estimate their complex high-dimensional parameters.However, this estimation problem is often not addressed, as many works proposing new models tend to study them theoretically or through simulations.Alternatively, they may reduce complexity by employing specific mobility models, thereby reducing the dimensionality of the problem.For example, some dynamic human systems use mobility parameters described by gravity or radiation models, producing more specific mobility parameters like origin-destination mobility matrices.
In this study, we leverage big data from smartphone geolocation information and the Brownian Bridge model to propose estimating mobility parameters beyond the conventional origin-destination matrices used in network models.Specifically, we focus on what we term the Residence and Occupation Matrix (ROM).Unlike the origin-destination matrix, which is typically used for commuter and migrant models, the ROM does not describe the number of travels between any two zones, but instead depicts the expected fraction of time that an individual residing in a specific zone spends in all other zones.This matrix allows us to characterize mobility patterns based on the most significant zones for individuals, according to the time they spend in them during various activities such as work, school, and commuting.
Although human mobility and COVID transmission have been extensively researched in the field of GIScience, valuable insights for future research are provided by Zhang et al. [58].These insights include 1. Fostering multidisciplinary collaboration, 2. enhancing mathematical models for analysis and prediction of disease transmission, simulation, and prediction, and 3. diversifying sources of mobility data to ensure accuracy and suitability.
This study addresses the first point as part of a multidisciplinary effort involving industry, the healthcare sector, epidemiologists, mathematicians, statisticians, and computer scientists.The complexity of data gathering, model development, statistical inference, and big data challenges necessitated this multidisciplinary approach.The outcomes of this research are directly related to [2] and [1].Regarding the second point, the works cited address this aspect, since we propose the epidemic model and theoretically studied it in [2], and divide the estimation problem into two parts: first, the estimation of mobility parameters (presented here), and second, the statistical inference of remaining epidemic parameters [1].The inference of the epidemic model using this strategy is based on smartphone information and COVID-19 incidence data in Hermosillo.
Regarding the last point, we acknowledge that while other studies utilize similar geolocation (GPS) data, they often do not fully exploit its potential.Typically, these studies focus on estimating mobility information related to residence location and origin-destination matrices, relying solely on the frequency of GPS reports in each zone.In contrast, our utilization of the Brownian model enables us to estimate comprehensive spatio-temporal information from discrete-time geolocation reports, offering significant opportunities for modeling various human activities in a metropolitan area in detail.Moreover, our approach facilitates the incorporation of estimates derived from real-time or near-real-time data acquisition, holding considerable promise for enhancing the application of mathematical models.This has the potential to advance our understanding and response to social, economic, or health-related events, such as the epidemic model outlined in this study.
The structure of the paper is as follows: Section 2 offers background information on mobility studies and epidemic models that incorporate mobility components.Section 3 outlines the demographic and GPS data collection from mobile phones.In Section 4, we detail the methods for residence selection and introduce the Brownian Bridge model, which we employ to estimate occupation times.Section 5 presents the results of the residence selection and the estimated ROMs for the designated periods.Here, we compare the ROMs to assess their stability and sensitivity in light of epidemic data, mitigation strategies, and celebrations.Additionally, we examine the sensitivity of the epidemic model under realistic mobility scenarios based on estimated mobility parameters.Finally, Section 6 provides a discussion of the results and outlines avenues for future research.

Background
Detailed human mobility records of a particular regional population can be obtained from mobile phone records [47] and such information can be used to predict future user's locations, validate human mobility models [36], study the role of human mobility and estimate the main characteristics of social networks [55], among others.Indeed, the desire to understand and predict how the human population moves in the real world has been one of the main objectives of data collected by mobile phone service providers [39].The diversity of applications of the study of human mobility cannot be over-emphasized.For instance, in the area of transportation, the study of human mobility can be used as a guide for the optimization of road networks and public transport systems, leading to efficient urban planning and engineering.Some other applications include humanitarian relief, health services, and identification of social graphs [30,35,39,43,44,50,56].
The heterogeneous human interactions can significantly impact the dynamics we study.In epidemiology, traditional mass-action compartmental models have proven their usefulness for different infectious agents in modeling infection dynamics at the regional level.However, for several diseases, it is crucial to consider the structure of sub-populations (defined by zones or patches) and their more specific contacts patterns that are usually induced by the spatial configuration and mobility.
To introduce heterogeneous human contacts, epidemics in networks model for individual connections (or reaction-diffusion models on meta-population networks) have played an essential role in journals with a physics focus [11,13,46,54].Nevertheless, the model inference, fit or selection, uses highly computing-intensive numeric methods such as Markov chain Monte Carlo (MCMC) or particle filtering based on Monte Carlo simulations.Unfortunately, the scaling of these methods makes them prohibitive when considering medium-to larger-size populations.This paper focuses on estimating the information used in the class of patchy epidemic models [2,8] that fall into an intermediate complexity between mass-action and networks models (at individual level).These models introduce population heterogeneities at the level of sub-populations and can include the within-and between-contact patterns induced by mobility.
Since this work emphasizes the applications for patchy epidemic models, from now on we refer as "patches" to the geographical zones that partition the area of interest.
Several epidemic models have proposed the introduction of regional mobility patterns [5,8,9,15,31,57] for human-to-human infectious diseases [7,26,34] or vector-borne diseases [3,14,29,32].These can incorporate mobility with an emphasis on a regional scale, which is critical in the increasing number of cases at the beginning of the outbreak [33].However, we want to evaluate mobility on the scale of the metropolitan area since most of these movements occur in the context of the daily existence of 55% of the world population living in cities [51,60].
Numerous authors have sought to identify and model mobility patterns using a wide array of data sources, ranging from traditional sources such as surveys and census data to more contemporary sources like GPS data, social media feeds, and online platforms [6].While official data and social networks may offer insights into mobility at a macroscopic level, covering large regions such as states or countries, they often lack the granularity needed to understand mobility within a metropolitan area in detail, both spatially and temporally.
For decades, scientists have studied animal movement using tracking devices affixed to wildlife, which generate detailed datasets capturing movement patterns within specific populations.More recently, geolocated smartphones have emerged as powerful tool for human tracking devices for humans, owing to their widespread adoption among a significant portion of the population.This ubiquity facilitates the collection of vast geolocation datasets, which provide detailed insights into human mobility and behavior with unprecedented spatial and temporal resolution.Consequently, this enables the study of human travel patterns at finer scales in both time and space [17,37,41,45].Mobile phones represent a valuable source of information for studying various aspect of human behavior, environmental monitoring, transportation, social services, businesses [16], and hold untapped potential as tools for public health.
While the potential utility of data sourced from mobile phones has been recognized for several years, analyses typically center around estimating population sizes in specific areas, identifying city activities, hotspots, and characterizing mobility patterns through contact networks [10] or metrics [23].These insights are often derived directly from reported GPS records and aggregated in time intervals.
In the context of epidemiological models, researchers have focused on understanding certain types of movement that exhibit regular patterns, such as commuting between two regions [48].Typically, these movements are estimated based on the frequency of GPS reports in each area.In more theoretical approaches, authors often assume specific models for human mobility from one region to another, such as adjacency and gravity models [6,25].Additionally, the impact of mobility, according to these models, is illustrated under various mobility scenarios [49].
For these these epidemic models, the mobility estimation reduces to inferring the percentage of individuals for each pair of residence and destination regions.However, epidemiological models, such as [2,8], consider mobility not only as commuting between two patches, but also as the different temporal contacts that individuals have with individuals on routes to various destinations during the day.These other epidemiological models then use the information on the origin (residence) and the proportion of time individuals spend in each region (including their own).We propose estimating this Residence and Occupation Matrix (ROM) with a model that takes into consideration the individuals' sequence of GPS reports ('pings') and the stochastic variation from the path between consecutive locations.
The objective is to determine the residential patch of each individual and statistically estimate the average time spent by inhabitants in each of the n patches of interest within the city, such as zip codes and other geographical areas.This involves generating an n × n non-negative matrix that delineates the proportion of time that individuals residing in a specific area are expected to spend in each patch.Our proposed method is based on the Brownian bridge process commonly employed in ecology to analyze animal movement.By leveraging this method, we can estimate an individual's location at any given time between successive pings.We applied this approach to data from Hermosillo, Mexico, during the COVID-19 pandemic.Additionally, we integrated the estimated residence-mobility matrix into a multi-patch compartmental SEIR model to evaluate the epidemic impact of mobility changes resulting from governmental interventions.

Geographic and demographic information
The city of Hermosillo is in the state of Sonora in the northwest of Mexico and has an area of 169.55 km 2 .It is the capital of the state, and it is located 280 km south of the border with the United States and about 110 km from the coast of the Gulf of California.More precisely, it is located at the 29 • 05 ′ parallel of the north latitude and the 110 • 57 ′ meridian of west longitude from Greenwich (Figure 1A).The Population and Housing Census1 , conducted by the National Institute of Statistics and Geography (INEGI) in 2020, reports a population of 2,944,840 in the state of Sonora, and the municipality with the largest population in this state corresponds to the city of Hermosillo, with a total population of 936,263.The local time zone in Hermosillo corresponds to UTC-07:00 throughout the year, and in contrast to all other states in Mexico (except Quintana Roo), Sonora do not switch to the national summertime that in 2020 was from April 5 to October 25.
The city and its metropolitan area are divided into 502 basic census geographical units (AGEB), as shown in Figure 1.These areas represent the smallest demographic units, and information such as population and basic economics is publicly available for them.AGEBs, or aggregations thereof, present a natural option for defining unit areas (patches) for urban mobility analysis.

Mobile phone sensing data
Mobile phone tracking is a process to identify the location of a mobile phone based on various technologies and methods.These include multilateration (triangulation) of radio signals and using a global navigation satellite system (GNNS or GPS).The first technology uses the constant roaming radio signals that a mobile phone emits and may be picked up by three or more stationary cell towers enabling one to approximate the device location The data that we use in this research corresponds to GPS pings collected by mobile service providers when users access specific (but undisclosed) apps for which they have granted permission to access their location information.
On September 14, 2020, in the city of Hermosillo, Sonora, Mexico, the University of Sonora entered into an agreement granting access to GPS data from mobile phone reports covering the city area.This agreement, subject to confidentiality conditions, enables us to disseminate scientific findings derived from this data.
The raw data was weekly stored in a daily-partitioned table in BigQuery (a Google Cloud service).Through the API service and SQL queries, we could download specific sub-data bases as well as the whole database, in order to do the data wrangling prior to the analysis.

The variables of mobility data
The mobility data used in this work consists of 80,582,452 records, which contain the variables in Table 1 that we describe afterwards.
id_adv It is a unique alphanumeric ID associated with each device (mobile phone).The database contains 306,963 such IDs.Each time a device utilizes a specific app with GPS localization capabilities enabled, it generates a ping, resulting in the addition of a new record to the database.The number of pings per device ranges from 1 to 18,297.However, 96.41% of these devices have at most 1,000 records in the complete database.
timestamp This field consists of a sequence of characters in the format YYYY-MM-DD hh:mm:ss UTC, indicating the date and time when a device generated a ping.All timestamps fall within the timeframe from 00:00:00 on September 18, 2020, to 23:59:59 on December 13, 2020, in Universal Time Coordinated (UTC).However, for estimating urban mobility, conducting various sanity checks, and determining the residence patch for individuals in the city of Hermosillo (as outlined in Section 4.1), we convert the timestamp variable to the local time zone.Therefore, the local dates and times of each mobility data observation range from 15:00 on September 17, 2020, to 15:00 on December 13, 2020.
lat, lon These two columns in the database contain numerical values indicating latitude and longitude coordinates of the device when a ping was made.To facilitate distance calculations between GPS data points, we transform these coordinate data into Cartesian coordinates using the UTM (Mercator) projection.This transformation was performed using the geopandas library in Python.The latitude values in the database range from 28.9753 to 29.1960, while the longitude values range from -111.1002 to -110.8457.

Spatial and temporal characteristics of the data
Periodic patterns resulting from daily routines such as commuting, school, rush hours, meal times, and nighttime activities significantly impact the mobility of urban populations and the frequency of ping registrations.These patterns can vary based on factors like the day of the week (weekday or weekend) and the availability of local transport systems [17].
In Figure 2, we present the complete time series of the number of pings categorized by both the day of the week and the time of day.This dataset spans 87 consecutive days from 2020-09-17 to 2020-12-13, and is divided in the plot into 13 weeks, starting from 00:00, 2020-09-14 (Monday) and ending on 24:00, 2020-12-13 (Sunday).Each week is visually differentiated by using a different color.The first day (2020-09-17) is represented by the 'short' red line within the Thursday window.The conclusion of the time series is indicated by the 'short' pink line, which ends at 17:00 hrs within the Sunday window.This visualization reveals the temporal regularity, aligning with the typical activity schedules for city of Hermosillo residents.For example, activity levels tend to decrease at dawn and rise during hours associated with work and leisure.Notably, distinct patterns emerge for weekends.
Figure 3 shows the bivariate normal kernel density estimate for traveled distance between consecutive pings and travel time (in a logarithmic scale for proper visualization) using all pings recorded during the study period from a sample of 10,000 id's.This graph was generated using MASS package in R and selecting kde2d's default parameters.The vertical dotted lines mark the corresponding time scale in seconds.Similarly, the horizontal dotted lines mark the corresponding distances traveled in meters.From the figure, we observe that most consecutive pings are at intervals between 2 and 30 minutes.These are divided into those where people move only few meters (likely walking or in a workspace) and those where they move between 30 and 500 meters (likely in a vehicle).At the top right, a third, less important group spends between 2 and 40 hours between pings but moves 500 to 10,000 meters.To carry out censuses and surveys, it is necessary to define, in the geographical area, the study areas.Such units are called Geostatistical Areas and in Mexico these are: State (AGEE), Municipal (AGEM) and Basic (AGEB).The latter is the smallest and fundamental geographical area and may be urban or rural.Urban AGEBs delimit a part or the total of a locality of 2500 inhabitants or more.In larger towns and cities, these AGEBs generally groups 25 to 50 blocks.In the case of the city of Hermosillo, there are 587 urban AGEBs whose population sizes are illustrated in Figure 1B.

Methods
One of the first papers that discussed urban analysis using data originated from mobile phones was [42] and was followed by works that highlighted possibilities of real-time visualization and monitoring of displacements in cities.In general terms, now we can distinguish models of human mobility aimed at reproducing individual mobility patterns or general population flows [6].
The residency-mobility matrix that we estimate can be viewed as a general population flow that describe the mobility patterns for individuals who inhabit each AGEB, but it is based on individual mobility modelling as we first estimate individual paths between consecutive pings.In the next sections we describe the model and the inference we propose.

Residence selection
Since we are interested in describing mobility at the AGEB level, it is necessary to identify the AGEB to which each ping's location belongs.For this, it is essential to transform the latitude and longitude data to the UTM coordinate system and identify the AGEB to which it corresponds.We do this using the AGEBs official polygon information.Now, as we also want to describe the mobility patterns for all individuals that live in each AGEB, and this information is not available for each ID, it is necessary to assign each ID to an AGEB as a residence using the very same database.That is, based on the ping information of each ID we determine the AGEB that its resides.For this, we use the an heuristic that combines criteria of frequency, timeframes, and AGEBs' population size.This selection process is described in Algorithm 1.
The primary idea behind this algorithm is to determine the AGEB of residence for an ID based on where it demonstrates the highest frequency of pings between 22:00 and 06:00 hours.If multiple AGEBs fulfill this criterion, the residence is randomly assigned by sampling from these candidates, with the sampling weighted by their respective population sizes.This heuristic falls under the category of unsupervised methods and represents a variation of the "grid frequency method" [53,59].The main distinctions lie Algorithm 1 Find the residence of all individuals for each individual i do s1 ← AGEBs with maximum GPS data points s2 ← AGEBs with maximum GPS data points between 22:00 and 06:00 where i is selected randomly weighted by population of f i end if end for in the fact that the grids correspond to irregularly shaped patches (AGEBs) that partition the area of interest, and it doesn't necessitate assigning a specific geolocation, such as a patch's centroid, but only the patch ID of residence.

The Brownian bridge model
Suppose that Z rj (t) represents the position of the j-th resident from r-th patch at time t.Then r ∈ {1, . . ., n}, as n is the number of considered patches, and j ∈ {1, . . .N r } where N r is the resident population size in patch r.
Let {z rj(1) , t rj(1) }, {z rj(2) , t rj(2) }, . . .,{z rj(n rj ) , t rj(m rj ) } be the information from the GPS reports for the j-th individual that resides in patch r.The term m rj corresponds to the total of pings collected during the observation time, and z rj(i) denotes the position of the j-th resident from the r-th patch at its i-th reporting time t rj (i).If we set the observation period starting with the first ping, then the individual observation period is [0, T rj ] with T rj = t rj(n) − t rj (1) .

Now, let Z
[k] rj (t) denote the unobserved position of the j-th resident from r-th patch at time t ∈ t rj(k) , t rj(k+1) which undertakes a random walk from positions z rj(k) to z rj(k+1) .If a Brownian motion is assumed between these two positions, then Z [k] rj (t) is distributed according to a bivariate normal distribution at time t ∈ t rj(k) , t rj(k+1) , with mean vector and covariance matrix given by where I is the 2 × 2 identity matrix and σ 2 rj is the Brownian motion variance related to the mobility of the j-th resident from r-th patch.Thus, the Brownian bridge process has the property of being normal along a straight line joining the points z rj(k) and z rj(k+1) .In addition, the maximum variability is obtained at the middle of the trajectory and the variance equals 0 when t = t rj(k) or t = t rj(k+1) .This assumption is adequate when the location reports are precise, but in many applications it is convenient to introduce the uncertainty related to measurement errors.For this, we consider that the starting and ending locations are bivariate normal MVN z rj(k) , δ 2 I and MVN z rj(k+1) , δ 2 I , where δ 2 is the variance of the location error.Therefore the expected time spent at location z, during t rj(k) , t rj(k+1) , is where ϕ (• ; µ, Σ) is the probability density function of a bivariate normal distribution with mean vector µ and covariance variance Σ, and Using the (m ri − 1) bridges, we then can estimate the density of the expected occupation time for each individual, as a mixture of the densities (1).That is For computing the overall fraction of time spent in a region A (occupation time in A), by individuals from patch r, we average the individual fractions of times in A considering all its patch residents as For this work, we are interested in estimating P r (s) for each pair of patches r and s and these component constitute the residence and occupation matrix (ROM).If we have that individuals can travel only to any of the n considered patches, the sum of the rows for the matrix [P rs = P r (s)] will add up to one.

Likelihood inference
First, we assume that the variance of the location error δ 2 can be known from the device specifications, while the individual variance σ rj of the mobility of the j-th resident from r-th patch is unknown.We estimate σ rj using the method proposed in [19] (also, see [38]).
Therefore, we consider the n rj as odd numbers and estimate the mean based on the independent Brownian bridges for the non-overlapping time intervals t rj(k−1) , t rj(k+1) k∈{2,4,...,m rj −1} , while regarding the in-between observation times t rj(k) as independent observations from these bridges to estimate σ rj .Thus, Z rj (t rj(k) ) is a bivariate normal random variable with mean and covariance matrix given by where Therefore, the likelihood function of the parameter σ rj can be written as where again ϕ denotes the probability density function of a bivariate normal distribution.
The maximum likelihood estimate for σ rj is the value σrj that maximizes L(σ rj ) in ( 4).Thus for any individual originating from patch r, the estimated probability density function at position z is given by ĥ * r (z) =
Then we can estimate the expected occupation time in A from resident of patch r, based on (3), as If the information of the device's location error δ 2 is unknown, the Markov property in the previous model is not longer true, but we can opt for the BMME extension proposed by [40] based on the joint distribution of B t , Z rj (t rj(1) ), . . ., Z rj (t rj(m rj ) ) ⊤ where Z rj (t rj(i) ) = B t rj(i) + ξ i , i = 1, . . ., m rj , {ξ i } are iid Gaussian variables with mean 0 and variance δ 2 that are also independent of the Brownian motion {B t , t ≥ 0}.
Since B t , Z rj (t rj(1) ), . . ., Z rj (t rj(m rj ) ) ⊤ is a multivariate gaussian vector, then conditional ditribution where Z = Z rj (t rj(1) ), . . ., Z rj (t rj(m rj ) ) , , and Then the density of the expected occupation time for each individual (2) under BMME, corresponds to Similarly to the previous model, we can compute the maximum likelihood estimates for σ rj and δ using the likelihood function based on the joint density of the increments where Σ X is the symmetric and banded matrix with i, s elements where i, u ∈ {1, 2, . . ., m rj − 1}.
The downside of introducing δ as a parameter extra to estimate is outweighed by the advantage of using all the data at the same time to estimate all parameters and the fact that this inference is not computationally more expensive to implement.Once the maximum likelihood estimates are calculated we obtain the expected occupation time in region A similarity to (5) and using (6), In general, the considered patches can have very irregular shapes and for computing the integral part in the previous expression we require to resort to numerical methods.For the application we opt for the uniform sampling Monte Carlo integration.Table 2: Periods within which the mobility matrices were estimated.

Data filtering and residence selection
In this section, we describe the periods of time we consider for the ROMs estimations, the filtering process to select the devices to include in each estimation, and the resulting residence selection.
We have chosen three periods, each divided into two parts, to estimate the mobility patterns.The objective is to compare the resulting estimates matrices pairwise for each period.Table 2 presents these time frames.
The reason for selecting these periods was to study the moderate to possible important changes in mobility patters in relation to the implemented control measures for COVID-19.The Secretary of Health of the State of Sonora adopted the "epidemic traffic light" and according to its colors (green, yellow, orange and red) the local government implemented specific measures.Some were recommendations, such as remote working and protective measures, but some other were more strictly imposed, such as limitations on gatherings, capacity restrictions for businesses and remote schooling.Figure 4 shows the number of daily COVID-19 cases for the city of Hermosillo between the second half of 2020 and the beginning of 2021.The existing values for the epidemic traffic light are indicated on the top of this graph.
To visualize the general epidemic pattern Figure 4 includes the moving average of cases.It also indicates the period of time for which we have the GPS information as the interval between the two vertical lines.Since the vaccination campaign started on 2021, all controls during this period were in relation to hygiene protocols, travel restriction, social distancing, and stay-at-home orders.
The contemplated periods and its parts are also represented in Figure 4 as the grey segments above the curve.It becomes evident that all selected periods primarily fall within the yellow alert phase for COVID-19 and are positioned between the fist and second epidemic waves in Hermosillo.Additionally, the second parts of periods 1 and 3 exhibit no Regarding the urban mobility, we postulate that the second part of each period could coincide with an increase in the movement of people within the city, as they encompass the dates of Halloween (November 31st) as well as the traditional Mexican holiday Día de Muertos (November 1st and 2nd).Both festivities are fervently celebrated in cemeteries, markets, and streets.This surge in mobility may be associated to the delayed rise in COVID-19 cases that marked the beginning of the second wave.

Data selection
The original mobility data in the database consists of 80,582,452 pings that register the timestamp of 306,963 devices (IDs).However, to proceed with the estimation, we filter them to ensure a minimum of pings per device in each part.Table 3  To perform the Brownian bridges estimation, we filter the mobile phone database for each period by dropping the IDs with less than 10 pings in a week.That is, for each period and each part, we keep IDs with at least 11 weekly pings in any of the two parts.Then the number of selected IDs for each period-part are: 108,252 (P1 A ), 123,878 (P1 B ), 108,252 (P2 A ), 120,156 (P2 B ), 102,091 (P3 A ), and 103,184 (P3 B ).
The population in Hermosillo city is about 936,263 inhabitants, then the number of IDs selected to represent the mobility is above 10.9% for any period-part.

Estimation of residence and occupation matrices for Hermosillo
Based on the selected IDs and their pings information during each period-part (Table 2), we obtained the maximum likelihood estimates for σ rj , (r = 1, . . ., n, j = 1, . . ., m rj ) and δ.
With this estimates, in turn, we are able to approximate the ROM for each period-part using (8) as where i,j are the respective residence and visiting patches (i, j = 1, . . ., n).
As we mention above, the integral is numerically solved using uniform sampling Monte Carlo integration.For this we randomly sampled 2-dimensional points within the area of Hermosillo and its metropolitan area, identified the patch (AGEB) to which each belongs and evaluate the density h ik z; σik , δ on them.
The result of this analysis renders 6 different matrices, each of dimension close to 502×502, as we consider n = 502 different patches (AGEBs) and few of them did not keep any IDs after filtering.The complexity of this level of information makes it challenging to convey solely through figures.A map may only capture a fraction of this data, such as a single row of the ROM, depicting the expected proportion of time that individuals residing in an AGEB would spend in all other AGEBs.Given the inherent complexity of representing the entire residence-mobility matrix in relation to spatial layout, we utilize alternative metrics to effectively identify certain attributes of mobility patterns relative to the AGEBs of residency.The first approach involves assessing the distance between the ROMs computed from both parts of each period.The second alternative entails presenting two maps that illustrate differences in two specific aspects of mobility: namely, the absence of mobility and the proportion of time spent by all individuals (who leave their AGEB at any given time point) in other AGEBs

Matrix distances
As our objective of assessing differences between the estimated matrices by period, we compute three of the most relevant matrix distances for the part A and B matrices in each period.These distance functions are the Manhattan (Mh), Euclidean (E) and Minkowski (Mk) distances, which can be computed for any two matrices with the same dimensions.Let [C] ij = c ij and [D] ij = d ij be two real matrices with the same dimensions.Then Minkowski distances is defined as follows: The Manhattan and Euclidean distances correspond, respectively, to the Minkowski distance with p = 1 and p = 2.We can be observed that these functions are symmetric and null only when all the entries of the matrices coincide.
The ROMS are estimated using the filtered data in each period-part and the number of residents may differ from one to the other.To produce matrices with the same dimensions in both parts of each period, and obtain their distance, we selected the AGEBs with at least one resident in both parts.Then for each period, we compare matrices with number of rows: 469, 479 and 463.Table 4 presents the distances for the estimated ROMs derived from the two parts in each period.
We observe that for P1, distances are consistently higher.Although its parts are defined when Hermosillo presented a low number of cases (see Figure 4), P1 A occurs several weeks after the first COVID-19 wave, while P1 B precedes it and encompasses two significant festivities: Halloween (October 31st) and Día de Muertos (November 1st and 2nd).The preparations for these celebrations and the ensuing social gatherings (in homes, markets, and cemeteries) can be associated with the increase in urban mobility and social interaction that facilitated the subsequent wave.
In contrast, the distance between the estimated ROMs for P2 A and P2 B is closer, not because they are closer in time, but due to the observed increase in data and the eventual shift to the orange alert level, which could once again restrict urban mobility.

Differences for mobility characteristics
To further study the pattern changes that occurred in Hermosillo, we propose dividing the population into two categories and them estimating the ROM for one of them.These categories are: individuals who never leave their residence patch in each period-time and those who do it.
We estimate the proportion of individual who never leave their AGEB as the proportion of IDs, in each period-time, that did not have any GPS report outside their AGEB of residence.This information is stored in 6 vectors, each of length close to 500 (as again, few AGEBs did not report any residents).
Considering the AGEBs present in each pair of periods, Figure 5A illustrates the differences in these proportions (proportion during Pj A − proportion during Pj B , j = 1, 2, 3).For each of the three periods, it is evident that during its second parts, this proportion is slightly larger for most AGEBs, indicating that there is no evidence that more people tend to leave their own AGEB.Now, focusing on the data from residents who left their AGEB, Figure 5B depicts the discrepancies in the daily fraction of time they spend in their own AGEB (the proportion of time in their own AGEB during Pj A − the proportion of time in their own AGEB during Pj B , j = 1, 2, 3).This is akin to examining the difference between the diagonals of the square ROMs, which are estimated considering only those individuals with at least one ping outside of their residence.
From Figures 5A and 5B we observe that while the number of indivuals leaving their AGEBs do not increase, those who do leave, spend less time within their residence.This trend is particularly evident during P1 and may potentially be linked to the preparation for festivities, which is also reflected in the larger distance measures between the ROMs for P1 A and P1 B (Table 4).
The selection of periods and parts was carried out with the objective of testing the consistency and sensitivity of the produced estimates for mobility.As anticipated, for P3 A and P3 B , which are consecutive and fall under the yellow epidemic traffic light, the estimates yielded minor differences.For P1 and P2, their second parts are separated by gaps of three and four weeks, respectively, from their first parts.As mentioned earlier, the mobility changes observed during P1 can be significantly attributed to preparations for and festivities surrounding the holidays during P1 B .However, P2 B includes only the two days of the second festivity (November 1st and 2nd) and the local government decided to increase mobility restrictions to the orange light level on November 9th, resulting in smaller distances for the matrices estimated for P2 (see Table 4).
Regarding the AGEBs and their population, it is notable that AGEBs with the greatest change in mobility for P1 and P2 are identified as having higher population density and a higher index of marginalization than the rest of the city.These areas correspond to the northern zone of the city (north of Blvd.Progreso) and the southwest zone (the area delimited by Paseo del Río Sonora and Blvd.Las Quintas), which is a residential area with a history of vehicular traffic problems due to the high population density [20].
An important aspect to note is that both parts of P1 and P2 occurred during the yellow epidemic traffic light, and throughout these weeks, the number of reported cases remained relatively stable (see Figure 4).However, from just these two pieces of information (implemented mobility restrictions and recent observed number of cases), we cannot predict the next COVID-19 wave in the city.From this, it becomes evident the practical relevance of harnessing mobility information for predicting and mitigating epidemic outbreaks in a given region.Having this information can help in future planning, as the risk of experiencing a second wave could have been associated with the observed longer visiting times to other AGEBs during P1 B (Figure 5).
The mobility patterns depicted in the maps of Figure 5 represent just a partial visualization of a complex information system that incorporates population density (Figure 1B) and the visiting time patterns of individuals.All this information is included in the epidemic model in the next section and is necessary for justifiable analyses of the effects of mobility on infection cases.

The multi-patch epidemic model with mobility and residency
Human mobility plays major role in the geography of health and epidemiology, as it is a capital factor in the reappearance and persistence of diseases [27].Particularly, the explosive urban growth and mobility of people within and between urban regions are factors that affect the geographical dispersion of infectious diseases [4].An approach to incorporate the spatial dynamics of mobility into infectious diseases is the use of mathematical models with differential equations that incorporate residency and mobility to describe human mobility within and between geographic regions in a synthesized manner [2,18,22,52].To our knowledge, articles following this approach to modeling disease dynamics focus on specifying particular structures for populaton residence, mobility or occupation.They implement human mobility models that are often not estimated (such as gravity and radiation models), or theoretically or through simulations these works demonstrate the mobility effects on various characteristics of the studied disease dynamics, such as disease transmission, endemicity, and epidemic peak and duration.However, from an applied standpoint, an important challenge lies in estimating the occupation time by residence.In fact, the problem of estimating ROMs in urban areas poses a theoretical and computational challenge due to the heterogeneity of individual human mobility within cities. Simply put, knowing how much time individuals residing in patch i of a city spend in patch j of the same city requires knowledge of which city residents live in patch i, as well as their continuous locations over time.The available information systems that come closest to providing continuous location data over time for general population are mobile phone detection systems.However, most of applications using this data estimate only aggregated data such as origen-destination for commuters or migrants, and not the fraction of time that we can expect residents spend in each patch.That is, to our knowledge, no formal methodologies have been published regarding the use of mobile phone detection data to estimate the expected time individuals visit each patch given his/her residence location.

Epidemic model
In this research, we employ the presented methodology to estimate the mobility and occupation parameters in a multi-patch epidemic model and assess their effect on it.To achieve this, we utilize the parameters estimated during the different periods as outlined in Table 2.
The following meta-population multi-patched SEIRS compartmental model has been theoretically studied in [2] and we refer the reader to this article to have the intuition behind its formulation and derivation.
The considered epidemic model evolves in n different patches with {N i } n i=1 inhabitants, and it is formally posed as: for i = 1, . . ., n and where i and j are the index for any two patches.The term pkj corresponds to the parameter product α k p kj , i = 1, 2, • • • , n.
Table 5 contains the definition of the parameters used in the model.The parameters α = {α i } n i=1 and P = {p ij } n i,j=1 describe the mobility and occupation by residence, and these can be estimated using the methodology used to estimate the ROM.
To appraise the role of mobility on the infection dynamics in Hermosillo, we first estimate the parameters α and P using the GPS information during each of the periods and parts described in Table 2. Then we obtain the solution for the system (5) for t ∈ (0, 200), using fixed epidemic parameters Λ i , β j , µ i , τ i , ψ i and κ i , i = 1, . . ., n, and each of the estimated mobility parameters α and P. Finally, we compare the effects of changes in the the resulting mobility parameter by obtaining the difference between the two generated infection counts for each period.
For instance, to measure the epidemic effect of mobility modification that occurred during P1, we estimate the mobility parameters for P1 A and P1 B , and compute the difference between the infection counts from (5) and under each residence and mobility matrix.Per capita rate at which the exposed individuals in patch i becomes infectious.
Table 5: Description of the parameters in model (10).

Estimates for mobility parameters
The mobility parameters categorize the population into two groups: individuals who never leave their residential patch and those who do.The parameter α describes the distribution for the proportion of individuals who leave their patch, while P refers to the occupation times of individuals in this same category.
We estimate the parameter α = {α i } n i=1 in a similar way to how we produced the vectors depicted in Figure 5A.For each period-part and i, we estimate α i as the proportion of IDs inhabiting AGEB i and registering at least one ping outside AGEB i.Then, the vectors represented in Figure 5A correspond to the estimation of 1 − α i , as the percentages are obtained relative to the total.
The matrix P = {p ij } n i,j=1 corresponds to the ROM for individuals who leave their patch.We estimate it as the ROM but considering only IDs that where used in the computation of α.In other words, we filter the data to retain only IDs that registered at least one ping outside their AGEB of residence and use (9).Then P is the ROM conditioned to individuals leaving their AGEB.
The epidemic model does not consider a ROM for individuals not leaving their patch, as it certain that they spend all day withing their own patch.

Mobility effects on the outbreaks
In order to examine how mobility influences the spread of disease across all the AGEBs in Hermosillo, we set the epidemic parameters and initial conditions for all simulations.The initial number of infected individuals was determined based on observed COVID-19 infection data in Hermosillo, with one initial case each in four AGEBs identified by their respective IDs: 2956, 3367, 5734, and 6200.Thus, for the numerical simulation, the initial values were set as follows: The demographic parameters used in the simulation were obtained from existing literature.The crude death rate for the state of Sonora, Mexico, according to [24], is 6% per 1,000 persons per year.As we are modeling infection dynamics on a daily basis, we used a constant natural death rate of µ = 0.06/(1000 × 365) for each individual in any AGEB.Population sizes, N i , for each AGEB were obtained from the 2019 Mexico census conducted by the National Institute of Statistics and Geography (INEGI).
For this analysis, we assumed an almost constant population size.Utilizing the natural crude death rate we calculated the recruitment rate of susceptibles for each AGEB as Λ i = µN i .Following [28], who posited that the contact rate of COVID-19 falls within the range β ∈ (0.5944, 1.68), we set β i = 1.5 as the daily rate of infection in each AGEB.Latent and recovery rates, as well as the rate at which recovered individuals become susceptible, were set as κ −1 i = 7 days, λ −1 i = 14 days, and τ −1 i = 180 days, respectively, based on diverse literature on COVID-19.
Figure 6 illustrates the disparities in infection counts derived from estimated mobility parameters across different periods ("I(t) under Pj A " − "I(t) under Pj B ", j = 1, 2, 3), shedding light on the impact of mobility on disease dynamics both at the level of individual AGEBs and globally.For instance, Figure 6A reveals a swift progression of the disease in most AGEBs during P1 B compared to P1 A , particularly within the initial period of approximately t = 30 days.This suggests that for the majority of AGEBs, their epidemic peak occurred earlier under the estimated mobility parameters in P1 B .As the epidemic curve evolves under the estimations derived from P1 A , curves in Figure 6A change signs.
Figure 6B showcases the disparity in the sum of infection counts for each AGEB between P1 A and P1 B over time.Due to varying population sizes among AGEBs, their contributions to the total number of cases vary.Notably, a few curves deviating from the described pattern do not significantly impact the overall trend at the global level.Similar patterns are discernible in Figures 6C and 6D, as well as Figures 6E and 6F, illustrating disparities in individual AGEB and global infection counts across P2 and P3, respectively.
From Table 4, we have that P1 had the most distant estimated ROM matrices.Nonetheless, Figure 6 reveals that the global evolution of diseases from the estimated ROM in P1 and P3 exhibits similar magnitudes (though both are more pronounced than in the case of P2).Indeed, during P2 B , the region was under yellow, transitioning to orange epidemic traffic light restrictions by the government, which could realistically reduce infections, as depicted in Figure 6D.
The similarity observed in the global differences for the epidemic curves for P1 and P3 (Figures 6B and 6F) cannot be directly explained by the distance functions.These functions were obtained for the estimated ROMs and not the estimates of the decoupled mobility information α and P. From Figure 5 we compare the changes in both parts for P1 and P3.We observe that in both cases individuals who traveled outside their AGEB spent a large proportion of their time in other AGEBs during their second parts.However, for P1 individuals who traveled spent more time outside, and for P3 the proportion of individuals leaving is smaller.We have to stress that these figures alone cannot fully explain the similarity in the simulated outbreaks under P1 and P3, as they represent only a fraction of the categorized mobility information and do not account for other important factors such as AGEB population sizes.
What we can conclude is that epidemic evolution is deeply linked to population and its mobility patterns.It also becomes evident that simple-summarized mobility models that are introduced into epidemic models may fall short of producing acceptable epidemic forecasts.
We cannot underestimate the effect of mobility in the epidemic dynamics, but it is linked to complex high-dimensional models that brings numerous challenges when directly addressing its statistical inference or model fitting.To confront this, we divide the estimation problem into two parts: first, the estimation of mobility parameters (that we have presented here), and second, the statistical inference of remaining epidemic parameters.Results for the inference of the epidemic model (10) using this strategy, and based on COVID-19 incidence data in Hermosillo, are presented in [1].

Conclusion
In this article, we propose a method to incorporate the geolocalization data to estimate not only an origin-destination matrix but more detailed mobility information on the daily visiting time by individuals according to their region of residency.The method is rooted in Brownian-type stochastic models that allow us to obtain a distribution of individual displacement trajectories, and with them, we can answer multiple mobility questions.
For the considered epidemic model application, we estimate the ROM at the AGEB level that describes the visiting patterns and times for all the individuals that live in each AGEB.This matrix, along with the alpha vector, corresponding to the probabilities that any individual move beyond their AGEB, is an input for a multi-patch SEIR compartmental model applied to the Hermosillo's urban area.
From the results, it is evident the real impact of population mobility on the epidemiological evolution, as we estimate the residence-mobility matrices utilizing actual data from the city of Hermosillo.Consequently, through the epidemic model, we have shown that however small the inter and intra-local residence human mobility may be, the net effect of the same can lead to a noticeable, exigent and momentous local and global infection levels.
As in many models incorporating mobility information, the estimation of residencemobility parameters can be the first step for fitting or estimating all involved parameters.In the case of epidemic models, we are interested in estimating the parameters and solving questions, such as forecasting and effective intervention identification.
The one-step estimation of all parameters is likely not feasible, as very complex models prompt non-identifiability problems.Then the produced estimates presented in this research allow us to reduce the model complexity and permit statistical estimation of the infectious agent parameters.This estimation is the subject of ongoing research.
A challenge that persists in the use of mobile phone data for mobility analysis is the assumption of representativeness of the sample.However, with the increasing use of smartphones, we consider that these databases are becoming more representative samples of the total population.Until then, we can incorporate alternative data to validate or bias-correct some of the estimated parameters.Furthermore, the selection of the residential AGEB can be enhanced not only by incorporating additional information from diverse sources but also by accounting for the uncertainty associated with the resulting allocation.For more precise results, it is desirable to incorporate this uncertainty into the resulting assertions on the mobility by AGEB of residence.
Despite the improvements we can make, the mobility and its estimation, using mobile phone data, already allows generating important inputs to expand the scope of many mathematical models that describe critical social phenomena such as the economy, violence, transmission of information or infectious diseases.With the proposed methodology, other estimates can also be made that consider, for example, mobility by time ranges, such as nighttime and weekends, or the description of mobility by subgroups, such as mobility by age groups and sex in the area of interest.

Figure 2 :
Figure 2: Number of pings by weekday and hour.

Figure 3 :
Figure 3: Kernel density of distance and consecutive pings.

Figure 4 :
Figure 4: Daily confirmed cases of COVID-19 in Hermosillo, Sonora in the time frame of study.The points corresponds to the daily observations, the solid black to the smoothed epidemic curve.The top colored bar indicates the local epidemic traffic light, and the grey indicate the periods and parts in Table2.

(
A) Differences for the proportion of individual that do not leave their AGEB of residency.(B) Differences for fraction of time spent in own AGEB of residency for individuals who leave.

Figure 5 :
Figure 5: Differences for the proportion of individual that do not leave their AGEBs and times spent within it, for those who leave, between the two parts in each period.

Figure 6 :
Figure 6: Differences in individual AGEBS (left) and global (right) infection curves and proportions for the first and second parts of each period.

Table 3 :
contains the number of IDs that in any of the two parts, of the corresponding period, report at least 5, 11, 15, or IDs distribution according to their number of weekly pings per period.21 weekly pings.The table also shows its magnitude as the percentage of the number of IDs with at least one ping in any of both parts.

Table 4 :
(p = 3) Matrices distances for estimated matrices in each period.
The proportion of individuals that leave their residence patch i. p ijThe proportion of time that an individual from patch i spends in patch j, given that is one individual that leaves its residence patch.