Social Distancing, Temperature, BCG and the Evolution of Covid-19: A Panel-Model Analysis

Background : This work presents results concerning the impact of some variables mentioned in the literature on the daily variation rate of cases (per million inhabitants) of COVID-19, based on a large sample of countries and an empirical model with appropriate control factors. We also propose indicators for measuring social distancing and BCG (Bacilo Calmette-Guérin) vaccine immunization. Methods : A statistical panel-model was applied to daily data for 165 countries from January 22 to July 31, 2020. Besides, two indicators are constructed for each country in the sample. The first of them measures social distancing, based on percentage of people circulating on transport stations, as a proportion of the circulation in a period before pandemic. The second indicator proposed estimates the current percentage of people immunized by BCG vaccine, based on the historical coverage and demographic factors. Results: We estimate that a strict social distancing may be associated with a reduction of around 6 percentage points in the daily variation rate of cases per million of COVID-19. Besides, the effects of temperature and BCG immunization proved to be statistically significant at the usual levels, indicating that lower temperatures and a low (or a lack of) BCG immunization may be related to increases in the daily variation rate of cases (per million inhabitants) of COVID-19. Conclusions: The analysis made clear the role of social distancing to control the pandemic. In addition, the method used did not allow to exclude the hypothesis that the evolution of COVID19 may be positively associated with lower temperatures and a low (or a lack of) BCG immunization.


Background Introduction and Objective
The governments of several countries affected by COVID-19 (Coronavirus Disease 2019) have been introducing social distancing policies on different levels. Considering the negative economic impacts of these policies, it is important to measure how important is the level of social distancing to control the pandemic.
Additionally, some empirical studies indicate that the incidence of the H3N2 and H1N1 viruses and other variants of influenza would be greater in regions and/or periods of lower temperature and humidity. Then, it is also worthwhile to investigate whether or not a similar pattern is observed, concerning the transmission of COVID- 19. Another hypothesis raised in the literature is whether the BCG vaccine would be able to offer some protection against COVID-19. If this is true, countries with no BCG vaccination programs tend to present more cases of the disease. This aspect was also investigated in this study.
The objective of the present work is to investigate and measure the possible association between the above variables and the daily variation rate of cases of COVID-19, using a statistical model for panel data. This study adds to the correlated literature by proposing indicators for social distancing and BCG immunization; by using a wide set of control variables and, regarding the sample, by covering data for 165 countries over 192 days, a larger scope than other related works.
The study makes clear the importance of social distancing in the mitigation or suppression of the pandemic. Additionally, the results suggest that lower temperatures and/or less BCG vaccination immunization may be related to an increase in the daily variation rate in registered cases per million. It is worth mentioning that these results were reached by using controls for a variety of other possible variables that, according to the literature, might also affect the evolution of COVID-19.
Finally, one must keep in mind that, given the absence of underlying epidemiological models for our analysis, the results obtained here should not be used for the definition of policies without being complemented by additional studies in related areas.

Bibliographic Review
Below we present the bibliographic review detailing the effects of each one of the main variables considered in this work on the evolution of COVID-19.

Effects of Social Distancing
Several recent studies investigate the effect of social distancing on the evolution of COVID-19. They use different kinds of measures. [3] incorporate the effects of social distancing by the basic reproduction number 4 , evaluating the outcomes from different simulated conditions, allowing to assess how the mortality rate changes when the social distancing is more effective. [2] use a variable that represent the use of air transport system, considering information on 122 airports in Brazil. [16] construct their social distance measure with the American Time Use Survey (ATUS), only for the U.S. citizens across American states, which accounts for the number of minutes they spend time in activities that could potentially expose them to crowds. [10] use census data to get the distribution of workplaces, households and school, and make assumptions about these variables, using them to parameterize the patterns of contact of people among themselves. [21] study the growth changes in COVID-19 cases before and after the implementation of social distancing measures in the U.S., by using data on school closures, closures of workplaces, cancellation of public events, etc. [14] use the variation of circulation in work places, from Google COVID-19 Community Mobility Reports database. [1] use a binary variable that indicates if a country has implemented partial or complete lockdown measure for its population, and also consider a variable that represents the number of days since implementation of the policy.

Effects of Climate Variables
The influence of temperature and humidity on the evolution of the pandemic is addressed in several studies. Some examples are [11], [23], [5], [20] and [4]. A general result of these works is that a reduction in the variation rate of COVID-19 cases per million may be associated with an increase in temperature. The present work intends to investigate this relationship further, using selected control variables that allows to isolate their impacts from those of other confusion factors.

Effects of BCG
Some authors investigate the relation between BCG vaccination immunization with the COVID-19 evolution. [17] and [19] categorize the BCG coverage as a binary variable: countries with universal policies of BCG vaccination and countries that never had this program. [13] make a comparison with the number of cases of the disease (per 100 thousand inhabitants) between an Israeli generation that was vaccinated and another that was not. [7] mention mechanisms by which BCG immunization reduces the severity of infections caused by other viruses, from controlled studies. These authors recommend randomized studies to investigate a relationship between COVID-19 and BCG immunization. [12] merged country-age-level case statistics with the start/termination years of BCG vaccination policy and their respective types of strains used, and conducted a regression discontinuity and a difference-in-difference analysis to verify the hypothesis of the BCG effect.
A common aspect of all these works is that they only verify whether or not a country has a current BCG vaccination program. However, there are countries that have recently interrupted their programs, so that a large part of the population that had been vaccinated would still be immune. Then, [15] use a third category in their study: countries that interrupted the vaccination program. A remaning problem, nevertheless, is that this variable classifies countries with very different percentage of BCG immune population in the same level. For example, countries that had stopped their vaccination programs decades ago or more recently would be in the same category.
It is important to mention that preliminary studies by World Health Organization [(24)] do not corroborate, but neither rule out, a relationship between BCG immunization and COVID-19.

Our Contribution
The present study estimates the impacts of some variables mentioned in the literature on the evolution of COVID-19 by a statistical panel-data model, using appropriate control factors. Besides, we contribute with the definition of two indicators: the first measures social distancing, based on the percentage of people circulating on transport stations in relation to the period before pandemic (obtained from GPS data mapping). The other indicator proposed estimates the percentage of people immunized by the BCG vaccine in each country, based on the historical coverage in the last year in which the vaccination program was effective, properly updated by demographic projections.

Data
We present below the variables considered in the study and, in the case of social distancing and BCG immunization, the specific indicators constructed.

Cases of COVID-19 (per Million of Inhabitants)
The daily number of new cases of COVID-19 (per million 5 ) was obtained from the World Health Organization website 6 , for 165 of the 202 countries that had registered cases up to July 31, 2020. 5 Cases per million = (Number of Cases/Population) *1,000,000. We consider cases per million of inhabitants, since a country X with the same growth rate as another country Y, but with a larger population, will tend to present more cases. 6 https://covid19.who.int/

Climate Variables
The data were taken from the National Centers for Environmental Information (NOAA NCEI) 7 , by using the Global Summary of the Day dataset. Daily average temperature, daily dew point temperature 8 and total precipitation for each weather station of each country were collected. Then, the daily average temperature, average dew point temperature and total precipitation for each country (in general) was calculated by taking the simple average of these weather stations. The relative humidity was calculated with the formula presented by [5].

Social Distancing
In the present work, the social distancing measure used data collected from the Google COVID-19 Community Mobility Reports database, which represents daily percentage variations of some variables, compared to a baseline given by a pre-pandemics period in early 2020, from 03/01 to 06/02 9 . For example, if in the day t, there is a value -5% in the mobility, it means that there are 5% less people on the streets, in relation to the baseline value (the median in the baseline period).
The data is divided into six categories, useful for social distancing efforts, as well as the access for essential services. They are as follows: markets and grocery, parks, public transport stations, recreation, residential and work places. For the purpose of this work, the public transport stations category, which captures the trend of mobility through public transport terminals, was chosen as the best proxy for social distancing 10 . This was done because of the strong relation observed in the sample between this proxy and the distancing measures effectively announced by the countries. In fact, the more people use public transportation, the more they go to places of work, recreation or residential places, for instance, so this proxy is expected to be related with most of the others. The next step was to create a variable called Circulation Index, CI it , defined as the inverse measure of social distancing (the greater the distancing, the lower the circulation) 11 . After a sequence of statistical tests, the clearest effects were observed when the variable was divided into two categories: circulation less than 40% and greater or equal to 40%, assuming value 1 in the first case and 0 in the second. Thus, we created the following binary variable, called strict distancing: To illustrate, Figure 1 presents the case of Italy, showing the evolution of cases per million and the Circulation Index, that, in a certain period, dropped to levels below than 40%.

BCG Vaccine Immunization
As highlighted in the bibliographic review, in order to investigate the relation between BCG immunization and COVID-19, it is not enough to verify whether or not a country has a current BCG vaccination program, since, in countries that have recently interrupted their programs, a large part of the population that had already been vaccinated would still be immune. In the present study, we created a variable "BCG immunization" to reflect the current proportion of immunized people.
To construct this variable, we initially attributed a value of zero to countries that have never had public vaccination policy for BCG (examples: USA and Italy) 12 . Next, for countries that have an active vaccination program (Brazil and China), the historical average percentage of BCG coverage was used 13 . Finally, we consider the countries with a vaccination program that had been interrupted as of some point in time (Australia and a large number of European countries). In these cases, the proportion of people currently immunized was estimated using the historical average coverage recorded up to the year the program was interrupted and demographic factors. An important point in the present study is the incorporation of adequate control variables, or confusion factors: the proportion of elderly people in the population, days after the first case, the capacity of the healthcare system, climate variables and others, all of which are described below.

Proportion of Elderly People (> 65 Years Old)
Many references indicate the greater effect the disease has on elderly people, resulting in a higher recording of cases of COVID-19 in populations with a higher elderly percentage (see, for example, [22]). A control variable was used for the proportion of elderly people (people over 65, in relation to the total population) registered in each country, according to World Bank data 15 .

Days Since First Case
This variable refers to the number of days counting from the first day that a case of COVID-19 was recorded in each country until the last day of the sample 16 .
The objective of incorporating this variable into the model is to control for the effect of the natural advance of the disease over time, independent of variations in the values of the other explanatory variables. For example, a reduction in the average temperature in a country, followed by an increase in the growth rate of COVID-19, would lead to a spurious association between transmission and temperature. This is because a natural elevated growth rate is expected in the initial stages of the pandemic -right after the first case registered -independent of any variation in temperature. [6], for example, establish that infection by COVID-19, in its initial phase, manifests in a very fast evolution, in which the pandemic presents transmission patterns similar to those of influenza, where the contagious phase occurs right after the first symptoms appear.

Capacity of the Healthcare system
Here we consider the number of hospital beds 17 , obtained from the World Bank website 18 . To enable a comparison between countries with different populations, we used a metric that considers the quantity of beds per thousand inhabitants. It is conjectured that, the better prepared the healthcare system of a country, the better it will be able to treat people and reduce contagion.

Other Variables
Additional variables considered included: testing capacity, income, pollution levels, rainfall, range of transportation system, and whether or not the country had experienced an epidemic recently. However, none of these variables proved to be statistically significant, at the usual levels.

Model
We used a statistical model for panel data. The dependent variable is the logarithmic variation rate of the number of new cases per million inhabitants, between the days t-1 and t, registered in the country i, this being: Yit = ln (  [21], [8], [2], [18] and [9].
The equation for the model is presented as follows: Y it = γ i + γ i + ϕ t + θ' + ε it (2) where i is the country index (i = 1 to 165); t is the index that represents the day (t = 1 to 192, where t = 1 corresponds the January 22, 2020, t = 192 corresponds to July 31, 2020); γ i , ϕ t and γ i represent the effects between countries (intercept and trend) and over time, respectively 20 ; and θ contain, respectively, all the variables described in data sub-section and their coefficients.
Initially, a procedure for the selection of variables was implemented via F tests, using a backwise methodology (general to specific) and considering a level of significance of 0.05 as a reference.
In all steps, we applied the usual correction for heteroscedasticity and serial autocorrelation errors.
The following variables were significant at the 0,05 level: Tempit = average temperature in country i and on day t; BCGi = estimated proportion of the population current immunized by BCG in country i; FCit = number of days since the first case of COVID-19 was registered in the country i; SDit = binary variable that indicates whether or not there is strict social distancing, as previously described 21 ; EPi = proportion of elderly people (aged 65 or greater) in country i; HMit = average humidity (10 3 hPA Kg/Kg); HBi = number of ICU (intensive care units) beds per thousand people, in addition to interaction (Temp it SD it , FC it SD it , EP it BCG i ) and non-linear (Temp it 2 ) effects.
Tables 1 and 2 below compare the estimated log-variation rates for countries with and without strict social distancing, according to the definition adopted here, considering different scenarios 23 .  19 The logarithmic rate is a proxy for: Yit = (C it − C i,t−1 )/C i,t−1 . 20 The effects γ i , invariant over time, allow for capturing differences between countries that have not been explicitly incorporated in the modelling (for example, because they are not possible to observe or difficult to measure). Examples include habits related to hygiene and social interaction. The fixed effects ϕ t , which do not vary among countries, allow for capturing global changes over time, such as information about the disease and meteorological conditions. 21 Remember that SDit = 1, if the people circulating on streets in relation to the level prior to the pandemic is less than 40%, in country i and t-21 days before (a 21-day lag time is considered, see footnote 10); and SDit = 0, otherwise. 22 Only the results relevant for the discussion of the theme of the work are reported. It is worth mentioning that, in addition to the fixed effects, a weekend dummy was also used to correct the effects of underreporting on these dates. 23 A similar exercise can be done with the other variables in the model. 8  7  6  5  4  3  2  1  HB i  8  7  6  5  4  3  2 1 Estimated Logarithmic Variation Rates (Cases Per million): 2,12% 2,90% 3,56% 4,35% 5,61% 6,63% 7,77% 7,63%

Coefficients Interpretation
The magnitude of the temperature effect depends on the values of SDit and Temp it , since both interaction and non-linear terms (Temp it SD it and Temp it 2 ) were significant. Besides, for most of the values in sample, it follows that lower temperatures might favor the epidemic evolution.
The negative coefficient for the BCGi variable indicates the expected effect of BCG vaccination immunization on the growth rate of COVID-19, meaning that, all things being equal, the data used here make it impossible to deny the hypothesis that populations with a higher percentage of immunization experience a milder evolution of the disease. Nevertheless, just as in the case of temperature, the calculation of the impact is not direct, and depends, in this case, on the EPi value, since the coefficient of EP it BCG i is significant. Particularly, the effect of BCG is attenuated as the population ages. This result makes sense, since the vaccine is applied to children and, after many years, the elderly may no longer be immunized.
The coefficient of the variable EP i indicates that, the greater the elderly population, the greater the growth rate for COVID-19. Despite the fact that a greater number of records in elderly people is expected, the presence of this variable in the model is essential for isolating the estimation of the other impacts. In particular, the BCG effect depends directly on the value of EPi.
The two remaining control variables that proved to be significant were humidity and the number of ICU beds (per thousand people), HB i . The first one indicates that the evolution of the disease is more severe in drier climates. This result is in line with [23] (see also [11]) 24 . The significance and signal of the HB i coefficient indicates that, as expected, the greater the capacity of the healthcare system is, the more prepared the country will be to isolate and treat contaminated people, thus reducing the probability of transmission. The variable HB i is also correlated with income, which may explain the exclusion of this variable in the final model 25 .
It is worth mentioning the importance of specifying a model with adequate control variables. In addition to those mentioned above, the variable FCit plays also a very important role: controlling for the fact that different countries in sample are, each day, in distinct phases of the pandemic 26 .

Effects of Social Distancing on the Evolution of COVID-19
Concerning the effects of social distancing policies, as already mentioned, the expected causal impact is confirmed, but only when the circulation variable is categorized in two levels: circulation less than 40% and greater or equal to 40% (in relation to the value prior to the pandemic) 27 .
We observe, in table 2, a strong reduction in the log-variation rates, as compared to those in table 1, suggesting a potential effect of a strict social distancing. For example, considering the scenario in column 4, the rate decreases from 4.35% to -1.43%, a drop of almost 6 percentage points.
From equation (3), we see that the effect of a strict social distancing depends on the temperature and the number of days since the first registered case of COVID-19. This is because of the significance of the Temp it SD it and FC it SD it coefficients. It is also worth mentioning that, for all cases in sample, independent from Temp it and FC it values, we observed a reduction effect of the strict social distancing on the variation rate of cases per million of COVID-19. In addition, the positive coefficient of Temp it SD it indicates that the strict social distancing not only may contribute to reduce the growth rate of the disease, but also to reduce the impact of temperature on this rate.
Specifically, with a strict social distancing (as defined here), may be associated a reduction of 100*(0,06024 -0,00008FC it -0,00024Temp it ) percentage points in the variation rate of cases per million of the disease. Table 3 reports the effects of strict social distancing, considering different values for temperature and the number of days since the first case. To illustrate, a country with average temperature of 25 o C, where the first case was registered 90 days ago, may have, associated with a strict social distancing, a reduction of 4.7 percentage points in the variation rate of cases per million of COVID-19. The effects vary from 3,86 to 6,02. 26 The omission of relevant control variables leads to inconsistent estimators for the model coefficients, so that the respective estimates would not have a practical application. A quadratic term was also considered to capture the drop in the growth rate in some countries at the end of the sample, but it was not statistically significant at the usual levels. 27 It is worthwhile to mention that this strict level of distancing was not reached by most countries in the sample.
Considering the mean values in sample, we estimate that a strict distancing may be associated with a reduction of 5.97 percentage points in the growth rate of cases per million of COVID-19.

Conclusions
The main result of this work is the potential association between a strict social distancing (defined here as a people circulation less than 40%) and a reduction in the evolution of cases of COVID-19, in accordance with the current debate about the importance of social distancing for controlling the pandemic. We estimate that a strict social distancing may be associated with a reduction of around 6 percentage points in the log-variation rate of cases per million of the disease. Besides, the data and the method adopted do not allow to exclude the hypothesis that the evolution of cases of COVID-19 may be positively related with low temperatures and a low BCG immunization.
The results presented here reflect the use of statistical techniques, as there is no underlying model of an epidemiological nature to allow more specific conclusions. In the absence of this type of complementary information, these results are insufficient for the formulation of public policies.
The authors suggest that it may be worthwhile to give continuity to this work in the future, with more recent data. This is particularly important in view of the new geographic characteristics where the pandemic has spread, with a strong impact, for example, in Brazil.