Does TB Vaccination Reduce COVID-19 Infection?: No Evidence from a Regression Discontinuity and Difference-in-Differences Analysis

In the middle of the global COVID-19 pandemic, the BCG hypothesis, the prevalence and severity of the COVID-19 outbreak seems to be negatively correlated with whether a country has a universal coverage of pediatric Bacillus Calmette–Guérin (BCG) vaccination, has emerged and attracted the attention of scienti ﬁ c community and media outlets. However, all existing claims are based on cross-country correlations that do not exclude the possibility of spurious correlation. By merging country-age-level conrmed case statistics of COVID-19 from 17 countries with the start/termination years of pediatric universal BCG vaccination policy and age-specic BCG vaccination coverage, this paper examines the role of BCG vaccination in COVID-19 infection. Despite the cross-country evidence from the previous literature, the results of both regression discontinuity design and difference-in-differences approaches do not support the BCG hypothesis. The results of these previous studies are likely to suffer from spurious correlations.


Introduction
There is no vaccine against the rapidly spreading 2019 novel coronavirus diseases (COVID-19), which has contributed to more than 17 million infections and 680,000 deaths worldwide (Dong et al., 2020). In search of the effective prevention and treatment for COVID-19, some researchers found that there was a strong correlation between the heath consequence of COVID-19 and national policy for Bacille Calmette-Guerin (BCG), a vaccine for tuberculosis (TB) disease. Miller and others found that both morbidity and mortality due to COVID-19 are associated with early adoption or universal coverage of BCG vaccination, while Sala and Miyakawa found that BCG vaccination slowed down the spread or progression of symptoms rather than reduced COVID-19 death (Miller et al., 2020, Sala andMiyakawa, 2020). Shet and others also found the correlation with mortality (Shet et al., 2020). Another study further indicates that the BCG strain with fewer epitopes deleted, such as the Japan and Russian/Bulgarian strain, are more likely to be effective due to higher and more frequent responses, although this paper is not about COVID-19 (Zhang et al., 2013). Based on these results, clinical trials for these BCG strains are now initiated in several countries (For example, see Medical News Today (Medical News Today, 2020)).
In the middle of global pandemic, the BCG hypothesis has quickly attracted the attention of popular media outlets such as Bloomberg and New York Times (Bloomberg, 2020, New York Times, 2020).
However, the previous findings are based on the cross-country association between health outcomes of COVID-19 and the national BCG vaccination policies, thus do not exclude the possibility of spurious correlation. For example, a country with higher BCG vaccination coverage is more likely to be poor as infectious diseases (such as TB) are still leading causes of death. Such country is less likely to be connected to the major economic regions such as China, Europe, and the United States due to their trade openness and geographic location.
There is an imminent need to re-examine this newly emerging hypothesis for two reasons. First, if there is no possibility, the society should spend more effort searching for more plausible solutions to combat COVID-19. Second, the growing demand for BCG vaccination for COVID-19, even without actual health bene t, can create a shortage of BCG vaccination for children who actually need it without bene ting anybody else. Clinical trials can provide us with a secure basis for the effectiveness of BCG vaccination against COVID-19, but it is worth reexamining this purported hypothesis using a more credible identification strategy before spending more time and resources to test it. This paper tests this hypothesis with a best available identification strategy based on observational data: a regression discontinuity and difference-in-difference analysis.

Data And Methodology
The study collected data on year of introduction and termination of the BCG vaccination and types of strains used from the BCG World Atlas (Zwerling et al., 2011). The data were then matched with countryage-level COVID-19 confirmed case statistics from information available from each government's website. This resulted in a data set over 17 countries with relevant information: Australia, Colombia, the Czech Republic, Denmark, Finland, India, Japan, Korea, Latvia, New Zealand, Romania, Singapore, Spain, Sweden, Switzerland, Thailand, and Vietnam. Exact age-level case statistics were available in Colombia, the Czech Republic, Thailand, Vietnam, and Singapore. In remaining countries, only a 10-year age-group level data were available. Of these 17 countries, the Japan strain or Russia/Bulgaria strain have been used in Japan, Thailand, Colombia, and Latvia. In Colombia, case statistics at the nationality level are available. Hence, the presented paper focuses on people with Colombian nationality in Colombia data, whereas all residents not differentiated by nationality are included in the other countries' data. The immunization rate of infants at each year is available from the World Health Organization's website (WHO, 2020). We refer to this in the figure, even though this does not necessarily correspond to the immunization rate at each age as of 2020.
This paper rst conducted a regression discontinuity analysis using data from Colombia, the Czech Republic, Thailand, Vietnam, and Singapore. The basic assumption of the analysis is that factors such as indoor air hygiene that can affect the COVID-19 infection rate do not discontinuously change around the age at which the BCG vaccination was introduced. If a comprehensive health care reform is implemented simultaneously with the introduction of a universal BCG policy, then this assumption is violated. However, any bias should occur in the same direction with the expected BCG effects, producing an upward bias. The presented paper demonstrates that even with this potential upward bias, any improvement in the COVID-19 infection rate was observed at the age of policy change. The study exploited the following timing of policy changes. In Colombia, the BCG vaccination was introduced in 1960 as a mass campaign targeted at young people under 15 years old (Arbeláez et al., 2000). In 1978, the strain changed from  1953,1981,1994, and 2010 were investigated. In Thailand, the BCG vaccination was introduced in 1977 for infants. In 1987, they changed from the Danish to the Japan strain. Between 1987 and 1991, they revaccinated at age 7. Therefore, the effects at 1977 and 1991 were investigated. In Vietnam, the BCG vaccination was introduced in 1985 for infants (Jit et al., 2015). Older people were not vaccinated, because they do not have a revaccination policy. In Singapore, the BCG vaccination started in 1957 (Goh, 1985). Because they revaccinate at ages 6, 11, and 15, the effect at 1942 (i.e. 1957 minus 15) was studied. Vietnam and Singapore do not use the Japan or Russia/Bulgaria strains.
Second, a difference-in-difference analysis was conducted using all 17 countries with 10-year-age-group level case statistics. The identification assumption in the current model is that without the BCG vaccination the expected log difference of the infection rates across age-groups are the same across countries. This paper constructed a treatment variable that indicates the age ratio within age-group t covered by the BCC vaccination ( ) in country i. The paper also constructed a treatment variable that indicates the vaccination using the Japan or Russia/Bulgaria strain ( ) and another treatment variable that indicates the vaccination using other strains ( ). The paper then regressed the log of the number of cases per thousand on these treatment variables, controlling for country and age-group dummies as: 3. Results Figure 1 displays the results of the regression discontinuity analysis for Colombia, the Czech Republic, Thailand, Vietnam, and Singapore. The figures from Vietnam and Singapore indicate that the immunization rate quickly increased after the vaccination policy. Thus, the discontinuity of treatment at the policy changes seems to be justified. In (a) Colombia, there is no significant decrease after coverage (74 years old at the end of 2019), the start of vaccination policy (59 years old): nor is there a significant increase after the change of strains (41 years old). In (b) the Czech Republic, there is no significant decrease after coverage (84 years old), the start of vaccination policy (66 years old), the change to the Japan strain (38 years old): nor is there a significant increase after the change from the Japan strain (25 years old), and the termination of the vaccination policy (10 years old). In (c) Thailand, there is no decrease after the start of vaccination policy (42 years old) or after the change from Danish to Japan strain (32 years old). In (d) Vietnam, there is no significant drop after coverage and start of vaccination policy (34 years old). In (e) Singapore, there is no significant decrease after coverage (77 years old), and the start of the vaccination policy (62 years old). In summary, results from regression discontinuity analysis do not support either the positive BCG effects, or the stronger BCG effects with the Japan and Russia/Bulgaria strains. Some argue that the effect of BCG lasts only for 20 to 30 years and hence would not be detected at old cohorts. However, the apparent no effect at the age of termination of the universal vaccination policy in the Czech Republic in 2010 would address this issue.  Figure 2 shows the scatter plot of residualized log cases per thousand and the residualized BCG coverage with the Japan or the Russia/Bulgaria strains. The highlighted circles represent Japan, Thailand, Colombia, and Latvia where the Japan or Russia/Bulgaria strains were once used. Again, we do not observe a negative slope: therefore, the results do not support the hypothesis that the Japan and the Russia/Bulgaria strains that are close to the original Tuberculosis are especially effective. This difference-in-difference analysis includes cohorts for which the universal BCG vaccination coverage terminated during 1990s and 2000s.
Thus, the result also addresses the problem that potential BCG effects last only for 20 to 30 years. Note that when country fixed effects are removed, the sign of the coefficient becomes significantly negative, consistent with the previous cross-country analysis of Miller et al. (2020). This underscores the importance of controlling for unobserved country-specific characteristics.

Discussion
There are several limitations. First, the analysis of this paper exclusively focuses on con rmed cases of COVID-19 infection due to the data availability. BCG immunization can still prevent from developing symptoms or reduce death once infected by COVID-19. However, given that those with COVID-19 symptoms and their close contacts are more likely to be tested and confirmed in the early stage of COVID-19 infection, the presented analysis still capture some effects of BCG immunization on developing symptoms. The number of con rmed cases also does not necessarily represent the total number of people who have COVID-19 due to the testing capacity issues. There is also possibility that population (in addition to individual) immunity affects COVID-19 infection. However, the presented analysis removed out potential population immunity effects of the BCG vaccination by controlling for country-specific unobserved fixed effects. This paper leaves this question to the future research. Another potential limitation is that people acquired immunity from the actual infection of TB, rather than the BCG immunization. If the effects of the BCG immunization and TB on the COVID-19 infection are the same and the TB infection rate was almost 100% around the change in the vaccination policy, there is a possibility that the presented analysis failed to pick up any additional effects of the BCG immunization.
However, this is implausible. Marks and others document that in Vietnam, only around 30-40% of age 30s and 40-50% of older population are infected by TB in 2016 (Marks et al., 2018). In Thailand, the infection rate of children under 14 years old in 1977, when the vaccination started, was 15.2% (Sriyabhaya et al., 1993). Since Thailand and Vietnam are among the highest TB burden countries in the sample of this study, we view these numbers as upper-bounds. There is also possibility that actual immunization rate did not reflect policy change, but this is also implausible given that BCG vaccination coverage rapidly increased from around 0 to 100% when a universal vaccination policy was introduced, as in Vietnam. Hence, if any BCG-specific effect exists, it should appear at the age of policy change.
Despite the cross-country evidence from the previous literature, the results of both regression discontinuity design and difference-in-differences approaches do not support the BCG hypothesis supported by the cross-country analysis. While the results of clinical trials can provide us with a secure basis for the effectiveness of BCG vaccination against COVID-19, the results of this paper suggest that the results from previous literature are likely to suffer from spurious correlations. Not applicable. All data are publicly available and free to use.
Author's contributions All three authors (MF, KK, HM) substantially contributed to study design, data collection, analysis, interpretation and manuscript writing. Author names are listed in alphabetical order, which re ects equal contribution to the authorship of the article.

Consent for publication:
Not applicable.
Availability of data and materials: All data used in this paper is available online. Regression Discontinuity Analysis Note: BCG coverage is the age at which a vaccination policy targeted.
BCG at birth is the period when a universal BCG vaccination policy exists. The immunization rate is plotted only when available.

Figure 2
Difference-in-Difference Analysis Note: The sample is at the country and 10-years age-group level. The yaxis is the log cases per thousands after controlling for country and age dummies. The x-asis is the ratio of ages covered by BCG vaccination within each age group after controlling for country and age dummies. The left panel is based on Model (1) and the right panel is based on Model (2). In the right panel, countries with Japan or Russia/Bulgaria strains are highlighted. The size of each circle represents the population size, and the solid line corresponds to a regression line weighted by the population size.