**Disproportional distribution of COVID-19 cases and deaths in world countries **

Although the COVID-19 pandemic has spread to all countries in the world, different countries have been impacted differently, based on data as of August 1, 2020. The U.S. recorded more than 5.5 millions cases and more than 170,000 deaths due to COVID-19, whereas some other countries these numbers are much lower, as in a scatter blot (**Figure 1A**). After controlling for population size, disparity for countries still remains, though a few countries with small population sizes, such as San Marino, lead the chart in COVID-19 cases and deaths per million population (**Figure 1B**). The case and death numbers are highly correlated, i.e., in general the higher the case number, the higher the death number in a country, although there are outliers with high case number but low death number, such as Qatar and Singapore (**Figure 1**). The unequal distribution of COVID-19 death and case numbers in different countries raises the question of what factors are important for the susceptibility of a population to SARS-CoV-2 infection.

**Demographic and socioeconomic factors correlated with COVID-19 impact **

To understand what demographic or socioeconomic factors are important for the COVID-19 impact, two different multivariate statistical analyses were employed. First, a pairwise Pearson correlation coefficient analysis was carried out for several factors with data available for most countries. These factors include each country’s total cases and deaths per million population due to COVID-19; total numbers of SARS-CoV-2 virus tests carried out; government response stringency index, which records the strictness of ‘lockdown style’ policies that primarily restrict people’s behavior 3; population density (number of people per square kilometer of land); median age; per capita gross domestic product (GDP) (2019); extreme poverty index, hospital beds per thousand people; the coverage rates of Bacille Calmette-Guérin (BCG) vaccination; and diphtheria-tetanus-pertussis (DTP3) immunization rates. BCG is a vaccine against tuberculosis (TB), but there has been observations of its correlation with COVID-19 impact 4. DTP3 is a combined vaccine offered to young children in many, but not all countries with historical data available similarly to those of BCG immunization 5. Since COVID-19 death and case numbers are highly correlated, they were used separately in the following two different statistical multivariate analyses.

**Factors correlate with COVID-19 death and case in pairwise analysis**

The MATLAB® function ‘corrcoef’ was first used for Pearson’s correlation coefficients for statistical relationships between independent variables. The analysis returned a matrix of correlation coefficients calculated from an input matrix whose rows are observations (210 countries and regions) and whose columns are independent variables (e.g., deaths per million, BCG rates). When these 10 factors were thus analyzed as independent variables, it is found that COVID-19 deaths per million most significantly (negatively) correlates with the country’s BCG vaccination coverage rates (r=–0.50, p=5.3e-5). COVID-19 deaths per million also significantly (positively) correlates with a country’s per capita GDP (r=0.39, p=0.0074), and with median age (r=0.30, p=0.042) (**Figure 2, Table 1**). COVID-19 death is found not significantly correlated, however, with a country’s stringency index, population density, virus tests, extreme poverty rates, hospital bed availability, and DTP3 immunization coverage (**Table 1**). Thus, fewer COVID-19 deaths were found in countries with higher rates of BCG vaccination, whereas more COVID-19 deaths are found in countries with higher per capita GDP or more elderly people.

When COVID-19 cases per million were analyzed with the same method, however, it is found that it significantly correlates only with tests per thousand and GDP per capita, suggesting that high-income countries that carried more tests have more COVID-19 cases (**Table 2**). The difference between COVID-19 case and death numbers in terms of their relationships with other demographic and socioeconomic factors begs further analysis with different methods.

**Multivariate analysis for VOVID-19 cases and deaths by linear regression**

A second method used for a multivariate statistical analysis of COVID-19 case and death was a generalized linear model regression with the MATLAB® function ‘glmfit’. Here COVID-19 deaths and cases per million, respectively, in each country was used as observed responses, and all other factors were used as predictors. Results from multivariate analysis with generalized linear model regression are similar to those from the above Pearson’s correlation coefficient analysis, although with interesting differences. Specifically, the most significant factor predicting COVID-19 deaths is, again, BCG immunization rates (p=3.7e-7; **Table 3**), consistent with Pearson’s correlation coefficient analysis. Also consistent with the Pearson’s correlation coefficient is that per capita GDP significantly correlates with COVID-19 deaths (p=0.016). Unlike Pearson’s correlation coefficient, the multivariate linear regression found that DTP3 immunization rates (p=0.0001), but not median age (p=0.0647), significantly correlates with COVID-19 deaths (Table 3). However, since DTP3 immunization rates positively correlates with COVID-19, the significant result is less informative, because it would suggest that DTP3 immunization promoted COVID-19 death. Thus, two different statistical methods both suggest BCG vaccination reduces COVID-19 deaths.

When COVID-19 case numbers were analyzed with multivariate linear regression, it is found that per capita GDP (p=1.75e-7) and tests per 1000 population (p=0.0093) significantly correlates with COVID-19 cases (**Table 4**), consistent with Pearson’s correlation coefficient. In addition, multivariate linear regression indicates that median age, government response stringency, and poverty levels are also correlated with COVID-19 cases (**Table 4**), unlike results from the Pearson’s correlation coefficient analysis.

BCG is a tuberculosis (TB) vaccine used in many countries with high TB prevalence 4. But BCG is not generally used in countries with low risks of TB infection, such as the US and most Western European countries. There have been reports of COVID-19 and BCG association with both positive and negative results 6-8, and currently there are ongoing clinical trials using BCG as vaccine against COVID-19 9,10.

This study used BCG vaccination rates from Our World in Data 11, with the rates averaged for all the years with available data (1980 to 2019) for each country, and these were further verified with data from The BCG World Atlas 12. Although the world health organization (WHO) started recording BCG coverage data only after 1980, most countries with BCG policy started immunization before 198012. The average BCG vaccination rates over all the years show highly significant inverse correlation with COVID-19 deaths by two statistics methods (**Table 1, 3**). Indeed, when countries are sorted by mean BCG immunization rate, it can be seen that the high COVID-19 deaths occurred more in countries with low BCG vaccination rates (**Figure 3**).

The world countries’ BCG coverage rates show a dichotomous pattern, and there is a steep drop from high coverage rates (nearly 100%) to no coverage (0%) (**Figure 3A, B**). Very few counties have a BCG coverage rate around 50% (a total of 9 countries have BCG rates from 40% to 60%). When countries were separated by BCG coverage rates into two groups of BCG rates≥50% (denoted as “BCG”) and BCG rates <50% (denoted as “No BCG”), the two groups of countries show significantly different mean deaths, but not significantly different mean cases (**Figure 3C,D**). This is in agreement with results from both Pearson’s correlation coefficient analysis and multivariate linear regression.

In summary, factors affecting COVID-19 cases and deaths were analyzed using Pearson’s correlation coefficient and multivariate linear regression. Both statistics methods indicate that COVID-19 deaths most significantly correlate with a country’s BCG immunization rate, while the number of coronavirus positive cases correlate with the country’s resourcefulness and testing capabilities.

**Correlation between COVID-19 death and GDP may be due to age**

A single factor identified by two statistical methods that significantly correlates with both COVID-19 cases and deaths is per capita GDP (**Table 1 - 4**). It has been reported that COVID-19 affects older people more than younger people 13. Indeed, worldwide COVID-19 deaths significantly correlate with the median age of nations (r=0.30, p=0.042) by Pearson’s correlation (**Figure 2, Table 1**). On the other hand, the median age of a country highly significantly correlates with its per capita GDP (r=0.64, p=1.93e-6). Thus a plausible explanation is that high per capita GDP leads to longer lifespan, thereby raising the median age of a country. In other words, the higher case and death numbers in wealthy countries were likely due to the higher age as a confounding factor in those countries.

**BCG vaccination and COVID-19 death inversely correlate in countries with high median age**

Since age may be a significant confounder in the correlation between GDP and COVID-19, is age similarly a confounding factor in the correlation between BCG vaccination and COVID-19? Correlation coefficient analysis shows that BGC vaccination does not significantly correlate with a country’s median age (r= –0.24, p=0.107; **Table 1**), although there appears to be a trend that countries with lower BCG vaccination rates may have a higher median age. This raised the question of whether the higher COVID-19 case and death numbers in countries lacking BCG vaccination can be confounded by their having more elderly population.

To reduce the possible confounding effects of age, propensity score matching for age was used to further evaluate the correlation between BCG vaccination and COVID-19. To this end, all countries were divided into three groups by median age. The median ages of the world’s 210 countries and regions range from 15.1 (Niger) to 48.2 (Japan) (**Table 5**). These countries were discretized into “young”, ”medium”, and ‘old” groups based on their median age, and their correlation with COVID-19 were analyzed separately in age-matched subgroups. It is found that BCG vaccination is not associated with COVID-19 cases and deaths for “young” and “medium” aged countries, but BCG vaccination remains significantly negatively associated with both COVID-19 case and death only in “old” countries (**Table 5**). Importantly, in these 61 “old” countries, BCG immunization rate and median age are no longer inversely correlated (r=0.116, p=0.373) (**Table 5**).

When separated by BCG coverage rates (≥50% vs <50%), among the 61 “old” countries, 36 countries with ≥50% BCG coverage have an average median age of 41.8±2.7, which is almost the same as the average median age of 41.1±3.025 for the countries with <50% BCG coverage (**Figure 4**). Thus, after controlling for the confounding effects of old age by propensity score matching, the inverse correlation remains statistically significant between BCG vaccination and COVID-19 case (r=–0.30, p=0.019), and between BCG vaccination and COVID-19 death (r=–0.42, p=0.0007), in high median-age (“old”) countries. Thus, BCG vaccination may reduce COVID-19 mortality in high median age countries.