The association of Coronavirus Disease-19 mortality and prior bacille Calmette-Guerin vaccination: a robust ecological evaluation using unsupervised machine learning

Population-level data have suggested that bacille Calmette-Guerin (BCG) vaccination may lessen the severity of Coronavirus Disease-19 (COVID-19) prompting clinical trials in this area. Some reports have demonstrated con�icting results. We performed a robust, ecologic analysis comparing COVID-19 related mortality (CSM) between strictly selected countries based on BCG vaccination program status utilizing publicly available databases and machine learning to de�ne the association between active BCG vaccination programs and CSM. Validation was performed using linear regression and country-specic modeling. CSM was lower for 80% of similarly clustered countries with a BCG vaccination policy for at least the preceding 15 years (BCG15). CSM increased signi�cantly for each increase in the percent population over age 65. The total population of a country and BCG15 were signi�cantly associated with improved CSM. There was a consistent association between countries with a BCG vaccination for the preceding 15 years, but not other vaccination programs, and CSM.


Introduction
The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and the resulting clinical condition coronavirus disease (COVID-19) have caused a worldwide pandemic.There have been 4.8 million con rmed infections and 318,000 deaths worldwide as of May 19, 2020 [1] resulting in signi cant global and personal insecurity [2,3].Mitigation of the pandemic requires a multifaceted strategy to reduce clinical morbidity/mortality, prevent disease spread, and, ultimately, the development of an effective vaccine.Many promising therapies for COVID-19 have demonstrated limited e cacy and the development of a vaccine will take time [4,5].Supplementing the existing armamentarium for COVID-19 is therefore of the utmost importance.
The bacille Calmette-Guerin (BCG) vaccine has been administered to almost 4 billion people worldwide for almost 100 years for the prevention of tuberculosis (TB) [6].Effectiveness for preventing pulmonary TB ranges from 40-60% and serious adverse events related to vaccination approach zero [7].The BCG vaccine is associated with several favorable effects including a reduction in neonatal mortality from respiratory infections and sepsis [8] as well as in the treatment of bladder cancer [9].When given in conjunction with anti-viral vaccinations including yellow fever and in uenza, patients pre-treated with BCG have demonstrated reduced viremia, decreased levels of circulating cytokines associated with cytokine storms, and no difference in, or an improved, anti-viral antibody response [10.11].These observations may be associated with a shift in the T-cell mediated response to pathogens, enhanced trained innate immunity, and/or an as yet undiscovered pathway [12].However, they provide an immunologic foundation which suggests BCG vaccination is associated with clinically meaningful immunomodulatory function.
Hegarty and colleagues described the association of the crude case fatality rate (CFR) between 179 total countries with active BCG vaccination programs and those without such programs [13].The CFR was 0.08 vs 34.8 per million for countries with and without BCG vaccination programs, respectively.In concert with the potential mechanisms described above, this work suggested that BCG vaccination might be associated with decreased COVID-19 severity.Since this time, several other authors have described similar trends suggesting that there is some degree of protection from severe COVID-19 infection, especially in elderly populations [14,15].These observations and the underlying immunomodulatory potential of BCG have prompted several worldwide clinical trials including the BADAS trial in the US (www.bcgbadas.org)to evaluate the impact of BCG vaccination on the severity and rate of COVID-19 infection.
Employing unsupervised machine learning methods with adjustment for numerous variables and potential established confounders associated with mortality, we evaluated the association between covariates designated a priori including BCG vaccination programs and mortality associated with COVID-19 at a country level utilizing pre-speci ed inclusion criteria.

Methods
Countries were selected for model inclusion based on prede ned criterion.Inclusion criteria included: more than 2,000 cases as of May 5, 2020, population greater than 5 million, and land area greater than 1,000 km 2 (to exclude city-states with the potential for non-representative population densities).Exclusion criteria included countries where BCG program start year could not be ascertained.
All data leveraged originated from publicly available data sources (Supplementary Table 1).A set of potential disease related mortality drivers spanning seven domains -socio-economic, health system readiness, environmental, existing disease burden, demographics, vaccination programs, and response to the pandemic were selected a priori (Supplementary Table 2).COVID-19 speci c mortality (CSM) was the primary outcome, de ned as deaths related to COVID-19 per million population assessed 30 days after 100 reported cases.
Analysis was conducted in a stepwise manner.We sought to group countries into comparable clusters based on previously described CSM drivers.To do this, we rst assessed the correlation amongst predetermined variables related to CSM (Supplementary Figure 1) which demonstrated substantial correlation between several explanatory variables.Therefore, exploratory factor analysis, an unsupervised machine learning method to reduce the original set of explanatory variables, was performed.The optimum number of factors were chosen using the scree plot (Supplementary Figure 2).An elbow was observed between 7 and 8 factors (Supplementary Tables 3a and 3b) [16].Varimax rotation was used to maximize the loading of each variable on a single factor.From each factor group, variables were chosen as inputs for subsequent clustering and multiple regression analysis based on loading characteristics and expert consensus where loading values were similar.Given the large size of the rst factor group, three variables were selected from the group.Population density was considered as a distinct group given low loading (below 0.3) value and included in addition to one other variable from group 6.There was low variation of values for factors in group 7 thus no variables were included from this group.The variables selected included GDP per capita, population, population density, temperature (Celsius), percentage of the population above 65 years of age, and stringency index (SI) (a measure of country level interventions in response to COVID-19) [17].
Countries were then clustered utilizing the k-means algorithm, an unsupervised machine learning method [18].The optimal number of clusters was determined using the average silhouette coe cient and Dunn Index (Supplementary Table 4, Supplementary Figure 3).Countries within a cluster were further segmented based on a categorical metrics related to BCG vaccination programs including if the country's BCG vaccination program was active and at least 40 years old or 15 years old based on prior works indicating a reduction of vaccination e cacy after a period of 15-40 years [19,20].Deaths per million from COVID-19 thirty days after each country crossed 100 reported cases was compared for countries with currently active universal BCG vaccination programs and for either the preceding 40 or 15 years and those without such programs within a cluster.Countries within each cluster demonstrated lower coe cients of variation in testing rates compared to the whole population, and therefore normalization of testing rates was not performed.
To explore whether the ndings were robust compared to alternate analytical approaches, we performed sensitivity analyses using linear regression models analyzing variables from each of the factor groups and CSM as the dependent variable.Additionally, age strati ed CSM data 57 days after 100 cases (available for 7 countries for comparable periods) was analyzed for the population under 40 years compared with percent BCG coverage for the population 40 years or younger.Age less than 40 was used since the data for yearly BCG vaccine coverage for infants is reported most reliably from 1980 onwards [21].The rate of and analytic strategy utilized for variables with missing information is presented in Supplementary Table 5. AP and AMK had full had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.The data that support the ndings of this study are available from the corresponding author upon reasonable request.RStudio V 1.3.959(Boston, MA, USA) was used for analysis.

Results
Of 212 countries/territories, 57 countries were included in analysis (Figure 1).Nine city states with insu cient land area or population and 141 countries with insu cient cases were excluded.Four countries met inclusion criteria but start dates for BCG vaccination programs were not available.China was excluded from the analysis as it was the rst country to report widespread cases of the virus and therefore might have introduced a lead time bias.
Factor analysis resulted in the identi cation of six, distinct variables including GDP per capita, population, population density, temperature, percent population above 65 years, and stringency index (Table 1).Variables related to BCG administration were part of a distinct factor group.Countries within clusters had lower variation of both COVID-19 testing rates and Global Health Security Agenda (GHSA) scores, compared to the overall population.Two cluster solutions, with 6 and 9 clusters, demonstrated the highest scores (Dunn Index and Silhouette Score).Since ndings were similar between the 6 and 9 cluster groups and cluster 9 only included 1 country in the 9-cluster solution (Supplementary Table 6), data for the remainder of the manuscript is presented from the six-cluster solution.
Deaths per million related to COVID-19 (CSM) was assessed 30 days after each included country reported 100 cases.Five of 6 clusters allowed division and comparison of CSM by the presence or absence of BCG vaccination programs for the preceding 15 years (BCG15) (Figure 2a).The remaining cluster composed exclusively countries with BCG vaccination programs (no comparison group-cluster 2).All 6 clusters allowed division and comparison of CSM by the presence or absence of BCG vaccination programs in the preceding 40 years (BCG40) (Figure 2b).Four of 5 clusters demonstrated lower mortality when they had BCG15 and 4 of 6 clusters demonstrated the same association with BCG40.For BCG40, speci city, clusters 1, 3, 5, and 6 demonstrated improved CSM with hazard ratios of 0.03, 0.01, 0.17, and 0.47, respectively.
Cluster 2 and 4 demonstrated worse CSM with hazard ratios of 2.43 and 2.24, respectively.The results from the 9-cluster analysis were similar (Supplementary Table 7).Granular data regarding clustering is presented in Supplementary Tables 8a/b.Univariate regression analysis demonstrated that the percentage of the population above 65, total 2020 population, BCG15, average temperature, GDP per capita, Stringency Index, and BCG40 were signi cantly associated with CSM (Table 2).On multivariate analysis, only the presence of BCG15 (reduction of CSM by 71% (95% CI: 53 to 89%), total population (for every 1 million person increase there was a 1% decrease in CSM (95% CI: 0.53 to 1.47%), and share of the population above 65 years (CSM increased by 10% for each percent increase in population over 65 (95% CI: 2 to 18%) were shown to be signi cantly associated with CSM.Percent coverage metrics for vaccinations including RCV1 (Rubella), MCV1 (Measles) and OPV (Polio) were forced into the model and were not signi cantly associated with CSM.
Age strati ed CSM for those under 40 years of age in relation to BCG coverage percentage for the same population was compared for 7 countries where the latest data was available (Table 3).Countries with no or low coverage for BCG vaccination in the population under 40, including the population between 30-39 and 20-29, had higher CSM for the same age groups, with the exception of Switzerland which had no reported COVID-19 related deaths in the 20-29 year age group.

Discussion
Using strict criteria designated a priori we have demonstrated an independent association between BCG vaccine administration programs active for the preceding 15 years and reduced CSM (71% reduction).BCG15 was more strongly associated than BCG40 with CSM suggesting, as would be expected, improved e cacy for more recently administered vaccinations.It might also represent improved data reliability or vaccination administration for more recent programs.CSM was higher for populations over 65 years of age.
CSM was lower for countries with lower total population which might suggest that transmission dynamics differ, testing rates are lower, that they or more able to mount a response, or an as yet identi ed factor is present [22].OPV, MCV1, and RCV1vaccination status was not associated with decreased CSM suggesting that it is not the global presence of vaccination associated with CSM but speci cally BCG vaccination.
Since we rst described the association between BCG vaccination policies and CFR, several additional studies have corroborated this nding [13,23].Sala et al. demonstrated that TB infection and BCG vaccination strategies were associated with decreased incidence and mortality related to COVID-19 [15].
Shet and colleagues demonstrated a 5.8-fold decrease in COVID-19 related mortality for populations with BCG vaccination [24].These studies, including ours, are hampered by the quality of the data from which they derive their analysis as well as by the inability to adequately include and capture all potential confounding variables.The present analysis is strengthened by the comprehensive nature of the analysis not present in prior works as well as the a priori de nition of input and outcome variables.The clinical validity of increased CSM for populations older than age 65 [25] has been well demonstrated.That this association was also determined in the machine learning models further strengthens the nding that BCG was associated with lower CSM.E cient contact tracing, isolation, and rapid testing, as part of a larger program of countermeasures, have proven effective at controlling SARS-CoV-2 outbreaks in areas such as China and South Korea.Neither the implementation of rapid contract tracing with targeted isolation, widespread testing, nor regional lockdowns have been as readily deployed in many countries [26,27].Hensel et al. found that for countries with high testing rates, BCG vaccination no longer correlated with incidence [28].However, in countries with current BCG vaccination policies and higher rates of testing, BCG vaccination remained signi cantly associated with reduced rates of CSM [28].For Israeli adults aged 35-41 with symptoms suggestive of COVID-19, no difference was found in incidence for those born during BCG vaccination programs or those born just after they ended.This represented a young, 6000-person cohort with only 2 cases of severe disease [29] but did highlight the need for data quality and completeness.Our work is further strengthened by evaluating COVID-19 mortality in 7 countries with complete vaccination data for the population under 40 where BCG vaccination continued to demonstrate an association with improved CSM.
The magnitude of the association between BCG and CSM must be taken in context with local responses to COVID-19.For example, in cluster 1, only South Korea (SK) had an active BCG vaccination program and the rates of CSM were lower in this cohort.This effect was again demonstrated for people in SK under the age of 40.The lower rates of CSM in SK might represent BCG vaccination, the efforts of the public health department, or an unknown/unmeasured variable [30].Similarly, in cluster 2, Ireland was the only country with an active BCG vaccination program, though with decreases in vaccination rates starting in 2005, but with higher levels of CSM which might more closely represent delay in taking COVID-19 measures [31].In spite of such country speci c possibilities, the general association of BCG vaccination status continue consistently demonstrated improved CSM.
We interpret our own ndings with a cautionary note since there are numerous potential measured and unmeasured confounding variables including rates of BCG vaccination compliance, age at vaccination, potential strain differences among BCG vaccines, as well as regional variations within countries, a lack of a veri ed metric to measure country-level COVID-19 response effectiveness, no measures of health system capacity to provide effective, critical care, and other, as yet identi ed factors.We agree with the sentiments of the World Health Organization and caution against routine BCG vaccination for the prevention of COVID-19 until prospective trials are completed.It is unclear if the protection from neonatal vaccination with BCG is transferrable to those receiving vaccination as an adult and how long such protection lasts.That is why some of the authors have initiated NCT04348370 (BADAS) trial in the US, joining other trials evaluating BCG administration for either COVID-19 prevention or disease severity reduction including: national clinical trial (NCT)04348370 (BADAS, USA), NCT04327206 (BRACE, Australia), NCT04328441 (BCG-CORONA, Netherlands), and NCT04350931 (Egypt).This analysis represents an attempt to utilize machine learning methods to address important questions in the eld of medicine which might foster accelerated research in medicine and epidemiology.

Conclusion
For countries included in our analysis using an a priori, rigid entry criteria, the presence of an active BCG immunization program for the past 15 years and total population are associated with improved COVID-19 speci c mortality while the share of the population over 65 years of age is associated with increased CSM.
For the included countries BCG15 vaccination programs are associated with a 71% reduction in the risk for CSM independent of population, population density, temperature, share of population above 65 years, and the stringency index of each country.A reduction in CSM was observed in 80% of country clusters for BCG15.This ecological analysis provides the most robust data regarding the association of COVID-19 speci c mortality and BCG vaccination programs.These ndings suggest that BCG vaccination is one of many potential additions to our armamentarium in the ght to reduce mortality related to COVID-19.Tables Table 1: Simpli ed composition of 6 clusters included for analysis.Variation is lower in the clusters than the general population.
***coe cient of variation for the population GHSA was 26.1.For all but cluster 6, there is less variation in the clusters than the population.
diagram for the selection of countries included in evaluation.212 countries and territories were initially screened with at least 1 case of COVID-19 as of May 5, 2020.Based on predetermined inclusion and exclusion criteria, 57 countries were included in the analysis.

Table 2 :
Results of linear regression analysis.The percentage of the population over 65 years of age was associated with higher rates of CSM such that for every percent increase in population over 65 years old, The antilog of all estimates.For example, when % population above 65 yrs.increases by 1%, the deaths/mn on an average increase by 1.10 times, i.e., 10% (with a 95% CI of 1.02 to 1.18 or 2% to 18%).Likewise, when a country has BCG coverage in the last 15 years, the deaths/mn decreases by 0.29 times, i.e., 71% (with a 95% CI of 0.11 to 0.47 or 89% to 53%) *

Table 3 :
Analysis of CSM for populations aged <40 compared with the percent BCG vaccination rate in the same age group.Countries with lower rates of BCG coverage generally have higher rates of CSM suggesting that, even in a less vulnerable population, BCG vaccination is associated with improved CSM.