Infectious Disease as a Mechanism Linking Health and Income Inequality

Background: Within-country inequality has been rising worldwide rapidly since the 70s. An extensive literature has examined the effect of inequality on health, finding health outcomes to be worse in more unequal countries. Among the measures of health used are life expectancy, mental illness, obesity, infant mortality, teenage births, homicides, imprisonment, etc., or some weighted index of such measures. While these measures of health are informative, they are indirect. Secondly, many studies fail to establish an independent effect of inequality on health. Finally, as noted, if the individual-level relationship between health and income is non-linear, cross-section studies may detect a spurious association between inequality and health due to an aggregation problem.This paper studies the relationship between the incidence of infectious disease, a direct and vital measure of health, and ambient income inequality. Our hypothesis is, the more income-unequal a society, the higher the chance a random mixing of people from different income strata brings the infected and uninfected closer, thereby raising disease spread. This implies two countries with similar per capita incomes but with varying levels of income inequality can exhibit very different trajectories of disease spread. We investigate this causal pathway by examining whether countries with elevated levels of income inequality have higher rates of Tuberculosis (TB) incidence per capita. The choice of TB is appropriate because it is an enduring, serious threat to global public health, the leading cause of death from infectious diseases worldwide. Moreover, it is well known that the four stages of TB pathogenesis (exposure to infection, progression to disease, late or inappropriate diagnosis and treatment, and treatment adherence) vary across rich and poor individuals.MethodsWe used publicly available panel data for one hundred and thirty-three countries between 1997 and 2013. The data include TB incidence and prevalence per 100,000 people. It also contains data on income inequality (Gini coefficient) both across countries and across time within countries. Our multivariate regression model controlled, among other variables, for economic output per capita, HIV prevalence, public health expenditures, population density, and poverty, and also incorporated a country-level fixed effect and time-fixed effects. A novel correction for “aggregation bias” using data on diabetes (a non-communicable disease) is also applied.FindingsOverall, elevated levels of income inequality were positively associated and causally connected to tuberculosis prevalence. All else same, countries with income-Gini coefficients a mere 10% apart would likely see a 5% difference in tuberculosis prevalence.InterpretationLike any air-borne infectious disease, TB is akin to a pollutant that spoils air quality and makes it unhealthy for all who breathe it. Our findings suggest a significant cause of this externality is ambient income inequality. In effect, TB is a negative externality whose reach amplified by income inequality. Around the world, the emergence of COVID-19 has renewed focus on the importance of reducing income differences. We join in that chorus by arguing that policy action aimed at reducing income inequities could directly contribute to a reduced TB burden by reducing the chance of infection spread via contact between the poor and the rich.


Introduction
If A is poor and B is rich, it is reasonable to expect B to be healthier than A. By contrast, the mechanism by which the gap in income between A and B could matter for their individual health is less well understood.
Does this gap matter in its own right, apart from the effect of incomes on health?
There is a well-studied and documented direct connection between income and health --high-income people and countries are likely to be healthier. The positive slope is intuitive: after all, low-income people often face significant barriers to medical care access 12 , are more likely to smoke 3,4 , abuse drugs 5,6 , be obese 7 , and face chronic stressors 7,8 like financial hardship. These factors, in concert, can reduce immunity, increase vulnerability to disease, and cause poor health.
The presence of a health-income inequality gradient 9,10,11,24 , on the other hand, is more controversial 12 . One set of explanations assert that A may have poorer health if they feel economically disadvantaged relative to B in a reference group, precipitating stressful social comparisons 13 . Ambient income inequality may also influence the health of both A and B if affluent people like B support disinvestment in public health or education, or social capital 14 , 25 . These explanations are usually indirect and rely on interpersonal, groupidentity comparisons, or indirect "general equilibrium" considerations.
Our broad question is, does income inequality have a first-order direct effect on health? Specifically, do countries with similar levels of per capita income but very dissimilar levels of income inequality have vastly different health? We narrow the focus of this question substantially by focusing on health outcomes that are impacted by disease, specifically communicable or infectious disease. We explore the hypothesis that the income-inequality link is in part mediated by infectious disease: in locations with greater income inequality and residence-work integration, infectious disease spreads more easily.
The underlying logic is simple and relies on three links.
• First, assume A and B live in a high-income-inequality area (X), meaning X has few rich people like B and many poor like A, so the median income is low, yet the mean income is high. Counterparts of A and B also live in a comparable area, Y, with a similar mean albeit higher median income than X.
• Second, suppose the indigent like A, due to the above-discussed health-income gradient, is significantly more prone to catching and harboring latent infections from an infectious disease, such as tuberculosis (TB).
• Finally, suppose people like A and B randomly mix in social settings (such as schools, food markets, malls, stores, stadia, restaurants, public transportation, and so on).
We posit it is more likely for infections to spread or linger in area X than in Y. The point is a random encounter between two people in a low-income-inequality area is very likely a meeting between people close in income (or health). Such contacts are unlikely to lead to new infections. The opposite is true in a highincome-inequality area.
(a) Log TB incidence versus Log Gini (b) Log TB incidence versus Log GDP per capita

Figure 1: Tuberculosis incidence
A first look at cross-country data suggests prima facie evidence in favor of our hypothesis. Figure  While this first look at the data is provocative, many extraneous factors apart from inequality might explain a correlation. In this paper, we address this problem in three ways. First, we adopt a longitudinal approach in our data analysis, focusing on the relationship between changes in inequality and disease incidence within countries rather than cross-country comparisons. Second, we adopt a statistical method that adjusts for measurable variables that may explain our correlation. Finally, we adopt an analytic approach that treats the relationship between inequality and the prevalence of non-communicable diseases -specifically diabetes and cancer -as a test of our hypothesis. Since these diseases are not communicable, if the hypothesis is correct, we should observe no relationship between income inequality and the prevalence of these diseases.

Data
We draw on data from the World Health Organization (WHO), the World Bank, and the World Income Inequality Database (WIID) to construct a panel for 133 countries during 1995-2013.

Disease incidence and prevalence
We take data on TB incidence and prevalence from the global TB database of the WHO. Incidence is the Our data on diabetes prevalence is from NCD-Risc 15 , which collects data from 751 studies that are population based and had collected biomarkers and then estimated the trend using a Bayesian hierarchical model. Since diabetes data is available only in prevalence form, a cumulative variable of new incidence, and since diabetes is an incurable disease, we take the first difference to measure new incidence.
Cancer incidence is taken from Institute for Health Metrics and Evaluation 16 , which collected cancer registry data for the period 1980-2010. It contains incidence of breast and cervical cancer incidence among women aged 15-79. Since these cancers develop only for women, we normalize incidence per 100,000 female population aged 15-79. We use these data to test our hypothesis that income inequality affects infectious disease spread but not the incidence of non-communicable diseases.

Income inequality
The Gini coefficient is the most frequently used measure of the extent of income inequality for a given community or society. It is defined as half of the arithmetic average of the absolute differences between all pairs of incomes in a population, the total then being normalized by mean income. If income in a population is distributed completely equally, the Gini value is 0; if a single person has all the income (maximum inequality), the Gini is 1.0. We used Gini coefficient data provided by the World Income Inequality Database (WIID). We exclude countries from our analysis where the data are reported as low quality or where the data are not representative of the country's entire geographic area or population. We extrapolate data points for years where data are unavailable by imputing information from the closest preceding observation.

Control variables
We select control variables based on what is most often used in the literature. These include economic variables that affect new TB infections are years of schooling (from Penn World Table), GDP per capita, public health expenditure (% to GDP), poverty, the share of people living in urban areas (from World Bank).
Data on HIV prevalence and BCG coverage are added to control for medical conditions related to TB risk.

Dynamic Panel Data Analysis
We construct a panel of 133 countries over the period 1995-2013 for the variables discussed above. Our dependent variables are new incidence of disease, which includes TB, diabetes, and cancer in a year; the explanatory variable of interest is the Gini. We log-transform the variables to achieve log-linearity, which has the added benefit that the regression coefficient of Gini measures its elasticity (how responsive it is to changes in the variable). We estimate the relationship of changes in disease incidence and income inequality rather than the cross-sectional correlation, thereby reducing omitted-variable bias by controlling unobservable country characteristics, such as socio-cultural and geographical variations, that are timeinvariant.
We employ a multivariate panel data model by controlling potential variables that might alter disease incidence and inequality. The controlled variables are discussed in the previous section. In modeling the infectious disease, we include lagged prevalence as a measure of direct exposure, itself a function of TB incidence. Such an inclusion introduces potential statistical bias and inconsistency when using a standard panel fixed effect model because of the correlation between the error and the lagged regressor. We applied the Arellano-Bover 17 /Blundell-Bond 18 methods to correct this problem. Technical details are provided in the Appendix. The same is not required while modeling non-communicable diseases since diabetes or cancer do not spread through interaction, so direct exposure to the disease does not cause a new incidence.
To see whether Gini has a differential effect on TB incidence depending on how rich a country is, we use the 2018 World Bank classifications to divide our sample into high, middle, and low-income countries.

Results
The main results are presented in Figure 2. It shows the estimated coefficient along with its 95% confidence interval. The coefficient of Gini (β) in the TB estimation is near 0.5 and statistically significant at the 1% level. This means if Gini increases by 10%, TB incidence is predicted to rise by 5%. More precisely, compare two countries, X and Y, with income inequality in X being 10% higher in X than in Y. Even correcting for differences in mean incomes across the two, it is predicted that TB incidence in X is 5% higher than in Y. In sharp contrast, Gini does not significantly affect diabetes or cancer, meaning ambient income inequality does not affect the probability of developing diabetes or cancer.
Higher average income significantly lowers TB incidence; the estimated coefficient is -0.27 and statistically significant at the 1% level. This means a 10% increase in income leads to a 2.7% decrease in TB incidence.
Diabetes is positively associated with GDP per capita: diabetes prevalence increases with income in low-and middle-income countries, presumably due to the nutritional transition (diet and physical activity) 19 sweeping these countries. Cancer incidence is not statistically correlated with income. Figure 3 presents the results of the TB estimation but for different country subgroups. Income inequality is positively associated with TB incidence in low-and high-income countries and is statistically significant at the 10% level. The effect of GDP per capita was not statistically significant when comparing across similar income subgroups.
Ruling out the statistical-artefact possibility In cross-country multivariate regressions involving inequality measures, there is always the danger the findings are simply statistical artefacts. This is because a positive correlation between a population health indicator and income inequality can arise in aggregate level data even when the two variables do not affect each other in individual-level data. Suppose there is a nonlinear relationship between income and health 20, . In that case, two countries with the same average income will differ in their average health if they have different income distributions. The difference in health arises because of an aggregation effect positively related to the difference in income inequality between the countries: the larger the income variance in the population, the larger the statistical artefact. If this is the case, a finding of a negative relationship between health and income inequality no longer implies inequality is a health hazard in and of itself; the relationship could be driven by the aforementioned aggregation effect.
Our approach to overcome this is to run similar estimations using country-level data (and hence, subject to the aggregation effect) for diabetes and cancer and then compare results across infectious and non-infectious diseases. In this case, TB is significantly associated with income inequality, while the non-communicable disease is not, strengthening our argument that infectivity interplays with income inequality. Bakkeli 21 also finds income inequality does not have a statistically significant influence on individuals' probabilities of having health problems in China where health is measured by blood pressure or obesity.

Principal findings
This paper studies a connection between health and income inequality and finds the two related in the case of health affected by infectious disease. Longitudinal panel data from a large set of countries reveal that even after controlling for average income, income inequality within a country increases the incidence of an infectious disease, TB, but not the incidence of a non-communicable disease, such as diabetes and cancer.
The direct effect of inequality is strong: All else same, countries with income-Gini coefficients 10% apart show a 5% difference in tuberculosis prevalence.
Comparisons with other studies Nearly every study that explores the relation between health and income inequality uses a measure of mortality, such as life expectancy, as a proxy for health. Most studies report age-standardized or age-adjusted mortality rates; others report period life expectancy, combining agespecific mortality rates to create a period survival curve from which life expectancy can be estimated. The majority of such studies find consistent and strong effects: "in a relatively inegalitarian country life expectancy may be between five and ten years lower than in a more egalitarian country." 22 Our approach, instead, is to use incidences of diseases instead of a broad measure such as life expectancy. Our choice is partly guided by our research question, the connection between disease and inequality. It is also motivated by the fact that disease is a large, and more importantly, direct contributor to health, morbidity and mortality, unlike life expectancy, a more indirect measure that is impacted by wars, famines, and so on.
Another important point of departure for us is that, unlike existing work, we go beyond simply reporting a correlation and present a causal connection between health and income inequality. Our statistical approach (Arellano and Bover/ Blundell and Bond) offers a work-around the standard endogeneity concerns and, unlike extant work, permits us to claim causality.

Caveats and limitations
Our analysis hints at but is unable to test a causal mechanism connecting disease spread and income inequities. Such testing would require micro data on mixing patterns among individuals with different incomes. Our argument relies on the mixing being mostly random. In reality, it may not be entirely so with rich people mixing mostly with the other rich and deliberately staying away from places where mixing with the poor is unavoidable. Future research using contact tracing data may help shed more light on this matter. the inequality data is constructed as follows; keep only high and average quality data, and only include observations that cover the entire area and entire population of the country. If there is more than one observation that satisfied the conditions, income-based Gini is preferred over expenditure or consumption. If there were more than one Gini after the refinement, the following criteria were considered in the following order 23 : income definition, income sharing unit, and unit of analysis. If the best possible Gini is consumption based, it is systematically lower than income inequality, so we follow the suggestion from Deininger and Squire (1996) 24 to add 6.6 on expenditure-based Gini. If the coefficient is not available for one of the years, the observation is taken from the closest preceding observation, so it forms a step function.

Estimation Strategy
Our model estimating TB is as follows: (1) Since non-communicable diseases do not spread through interaction, we do not need to control the prevalence of the disease estimating the association of non-communicable diseases and income inequality.
We estimate equation (2) using fixed effect estimation. Dependent variable uses two different diseases. The first one is diabetes, new diabetes incidence proxied by the difference of prevalence.
The second one is the incidence of cancer; cervical and breast cancer per 100,000 female population.

Aggregation Effect
In cross-country multivariate regressions using measures of inequality, there is always the danger that the findings are simply statistical artefacts. To see this, suppose the health-income gradient is convex shaped. In that case, two countries with the same average income will differ in their average health if they have different income distributions. The difference in health arises because of an aggregation effect positively related to the difference in income inequality between the countries: the larger the income variance in the population, the larger the statistical artefact. If this is the case, a finding of a negative relationship between health and income inequality no longer implies inequality is health hazard in and of itself; the relationship could be driven by the aforementioned aggregation effect.
To ensure we are correctly identifying the extent to which income inequality directly raises TB infection risk, we correct for the aggregation effect by running a regression similar to (1) using diabetes (and, separately cancer) as the dependent variable. Our argument is that diabetes is also an aggregate measure so it also suffers from aggregation effect, but it is a non-communicable disease, and hence, does not has externalities. Thus, TB and diabetes both suffers from aggregation effect but the unique mechanism of infectious disease that infectivity interplays with income inequality is only present in TB. The difference on the association of the disease and income inequality will be because of infectivity.    (1) is the sum of cervical and breast cancer.