COVID-19: Examining the case growth-rate due to Visitor vs Local Mobility using Machine Learning: A Study in the United States

Background: Travel patterns of humans play a major part in the spread of infectious diseases. This was evident in the geographical spread of COVID-19 in the United States. However, the impact of this mobility and the transmission of the virus due to local travel, compared to the population traveling across state boundaries, is unknown. This study evaluates the impact of local vs. visitor mobility in understanding the growth in the number of cases for infectious disease outbreaks. Methods: We use two different mobility metrics, namely the local risk and visitor risk extracted from trip data generated from anonymized mobile phone data across all 50 states in the United States. We analyzed the impact of just using local trips on infection spread and infection risk potential generated from visitors’ trips from various other states. We used the Diebold-Mariano test to compare across three machine learning models. Finally, we compared the performance of models, including visitor mobility for all the three waves in the United States and across all 50 states. Results: We observe that visitor mobility impacts case growth and that including visitor mobility in forecasting the number of COVID-19 cases improves prediction accuracy by 34%. We found the statistical significance with respect to the performance improvement resulting from including visitor mobility using the Diebold-Mariano test. We also observe that the significance was much higher during the first peak March to June 2020. Conclusion: With presence of cases everywhere (i.e. local and visitor), visitor mobility (even within the country) is shown to have significant impact on growth in number of cases. While it is not possible to account for other factors such as the impact of interventions, and differences in local mobility and visitor mobility, we find that these observations can be used to plan for both reopening and limiting visitors from regions where there are high number of cases.

Many countries have passed the first and second peak, and aggressive vaccination efforts and containment measures have limited the spread of the pandemic. While counties are beginning to slowly reopen, the threat from the pandemic is far from over, especially as the new delta variant that has spread to multiple countries. A question remains as to the risk contribution of external visitors, as rebound travel begins as cases come down and restrictions begin to ease.
The importance of tracking human mobility as a significant indicator to understand and predict the spread of COVID-19 has been an active research topic [1,2,3,4,5,6,7]. Researchers and local governments continue to track human mobility in their communities through anonymized cell phone data made available through various data providers [8,9,10]. An earlier study by Badr et al found a strong linear correlation between mobility ratio and COVID-19 growth rate between Jan 24th, 2020, and April 17, 2020, for the top 20 US counties that had the highest number of cases [11]. Several studies in United States, Europe, and China report association between mobility and stay-at-home orders with growth in number of cases [12,13,14,15,16,17,18,19,20]. More recent work also studied the impact of lockdowns on mobility [21].
The visitor mobility has also been studied in different contexts. For instance analyzed tourist/visitor demand to various destinations to estimate potential COVID-19 risk exposure [22,23,24]. However, there are no studies to analyze the difference between visitor and local mobility. Linka [13]. In a recent European study, it was shown that internal mobility is more important than mobility across provinces to control COVID-19, and the typical lagged positive effect of reduced human mobility on reduced human mobility on reducing excess deaths is around 14-20 days [18].
A big question that policy makers at state, or country level jurisdictions face is if the impact of visitor mobility is different than local mobility. In this study, we examine the transmission risk propagated as a result of local mobility vs. risk propagated from visitor mobility for all the states in United States. We use two different variables to capture the infection propagation risk, namely the local transmission risk (due to local mobility) and the visitor transmission risk (due to visitor mobility), and evaluate the impact of these variables on case growth. This study was done across all the 50 states in United States from March 2020 to December 2020.

Infection Data
The confirmed cases data was retrieved from the Corona Data Scraper open-source project [26], which provides county-level data on the number of new cases per day.
We aggregated the number of cases to a state-level.

Mobility Data
State-level mobility dataset and metrics were provided by SafeGraph [27]. Safe-  state). This data was collected for all the trips made between January 1, 2020 and December 31, 2020.

Approach
In order to measure the impact of mobility (both local and visitor), we model the number of cases at a particular location based on the historical number of cases, the transmission of infection based on the mobility. Higher accuracy when a factor is included in the model shows that the particular factor is important [28]. The features used to forecast the number of cases is listed below:

Number of Cases
The aggregated new cases from the previous 14 days is used to forecast the number of cases for the next 14 days; earlier studies have shown that the virus incubation period is about 14 days [29].

Local transmission risk
The local transmission risk represents the transmission potential of the virus based on the recent number of cases per capita (which represents local case incidence) and the mobility both at the local level. The local transmission coefficient LT for a spatial region i is calculated using the formula: Where M i,i represents the number of trips where the origin and destination of the trips fall within the region i. The cases per 100,000 people at the location i, which we denote as C i .

Visitor transmission risk
The visitor transmission risk represents the transmission potential of the virus based on the recent number of cases per capita at the visitor origin. The visitor transmission V T at a location i can be calculated using: Where M j,i represents the number of trips that originate at j and end at location i and j = i. The cases per capita at location j is represented by C j . These three measures are illustrated in Figure 1.

Machine Learning Methods
We employ various machine learning techniques to forecast number of cases based the local vs visitor mobility. The general idea is to evaluate if including visitor transmission risk improves the forecasting performance by analyzing the relationship between future number of cases and historical local mobility vs. future number of cases with visitor mobility using machine learning methods. Machine learning models are more capable of capturing the non-linear relationships between various features. The abundance of the COVID data -case data and mobility patterns, enable us to identify complex relationship patterns. In this study, three popular machine learning methods were used, namely, Linear regression, Random Forest Regression, and XGBoost Regression to forecast the number of cases. These models take into account the historical number of cases, local transmission risk and visitor transmission risk into account when forecasting the future number of cases.

Evaluation Criteria
The predictive performance of the proposed approach for each of the stations is compared using the following two metrics: Mean absolute percentage error (MAPE) measures the average percent of absolute deviation between actual and forecasted values.
Root mean squared error (RMSE) captures the square root of average of squares of the difference between actual and forecasted values.
Where, N is the number of test samples, A is the actual value, and P is its respective predicted value. For each of the techniques, we evaluate the accuracy of prediction with and without using the visitor transmission risk.  i.e., the forecasts are not similar using the two models.

Results
Tables 1 and 2 show the comparison of the machine learning forecasts with and without the inclusion of visitor mobility and local mobility. We compare the performance of the three machine learning models (Gradient Boost, Linear Regression, and Random Forecast) using the MAPE and RMSE. In addition a DM test was performed to evaluate the significance of forecasts when visitor mobility is included in the model.   Table 5 shows that the inclusion of external mobility leads to better forecasts for all 50 states in the United States.
We make similar observations when the data is separated into three waves (Tables   3 and 4       While it is apparent that the majority of the visitor transmission risk is due to travelers crossing state boundaries from neighboring cities, there is also consider-

Conclusions
In this paper, we evaluated the impact of the disease transmission risk due to visitor and local mobility on the number of cases at a state level for all 50 states in the United States. We observed that visitor mobility is an important factor to explain case growth. The prediction accuracy improved by 29% for the whole duration of the pandemic in 2020 (March -December) when visitor mobility was used in the forecasting model. The impact of transmission risk due to external mobility is observed across all three phases of the pandemic in the United States. We observe the influence of mobility is much stronger in the first phase of the pandemic compared to the second or third phase. These observations are consistent with some of the earlier studies [4,11] where mobility was observed to be an important predictor for case growth in the first phase of the pandemic.