Data
This study used the first and second round of India Human Development Survey (IHDS). IHDS round I is a nationally representative, a multi-topic survey of 41,554 households in 382 districts, 1503 villages, and 971 urban neighbourhoods of India conducted during 2004–05 (Desai et al., 2008). The IHDS round II is also a multi-topic panel survey of 42,152 households in 384 districts, 1420 villages, and 1042 urban neighbourhoods of India conducted during 2011–12 (Desai & Vanneman, 2015). Both rounds of IHDS have been conducted in all states and union territories of India (except for the two union territories of Andaman and Nicobar Islands and Lakshadweep) using a multi-stage stratified random sampling. For details regarding the IHDS sampling design please refer to (Desai et al., 2010).
During the first and second rounds of IHDS, data for tobacco and alcohol consumption were collected from 33116 and 34090 individuals respectively. These subsets of observations from the original cross-sectional datasets were used respectively from both rounds for analysis. Any observations with missing information regarding tobacco and alcohol consumption were retained for analysis. The association between media exposure and tobacco and alcohol consumption was examined separately for both rounds using the same set of the outcome, explanatory, and control variables.
Outcome variables
During the first round of IHDS, interviewers collected information about whether an individual “smokes cigarettes, bidis or hukkah”, “chews tobacco” and “drinks alcohol”. The answers were coded into three categories—”never”, “sometimes”, “daily”. For analytical purpose the information was converted into binary nature—persons in “sometimes” or “daily” category were coded into “yes” and otherwise they were coded as “no”. Thus, the three outcome variables used in the first round were—whether a person smokes tobacco (no, yes), whether a person chews tobacco (no, yes), and whether a person drinks alcohol (no, yes).
A similar procedure was used in the second round of IHDS too for the construction of outcome variables. In the second round, interviewers collected information about whether an individual “smokes cigarettes”, “smokes bidis or hukkah”, “chews tobacco” and “drinks alcohol”. The information was coded into four categories—”never”, “rarely”, “sometimes”, “daily”. For analysis, the persons in “sometimes” or “daily” category were coded as “yes” and otherwise they were coded as “no”. The variables “smoking cigarettes” and “smoking bidis or hukkah” were combined into a single variable indicating whether a person smokes tobacco (yes, no). Persons who smoked either or both cigarettes, bidis or hukkah were coded as “yes”; otherwise they were coded as “no”. The other two outcome variables in the analysis of second-round data are—whether the person chews tobacco (no, yes) and whether the person drinks alcohol (no, yes).
Explanatory variables
The indicators of mass media exposure are used as the explanatory variables in both rounds of datasets. During the first round, interviewers asked respondents that how often do people in the household “listen to radio”, “read the newspaper” and “watch television (TV)“. The three questions were asked separately for men, women, and children of the household, and answers were recorded under three categories—”never”, “sometimes”, “daily”. This information was combined to construct three binary variables—whether anyone in the household listens to radio (no, yes), whether anyone in the household reads the newspaper (no, yes) and whether anyone in the household watches TV (no, yes). This study constructed variable, the households in “sometimes” and “daily” category in the original variable as “yes”; otherwise, they were recoded to “no”.
A similar procedure of recording was used for the mass media exposure-related variables in the second round of IHDS. Interviewers in the second round asked respondents how often do people in the household “listen to radio”, “read the newspaper” and “watch TV”. Again, the three questions were asked separately for men, women, and children in the household, and answers were recorded under three categories—”never”, “sometimes”, “daily”. The information was combined to construct three binary variables—whether anyone in the household listens to radio (no, yes), reads the newspaper (no, yes), and watches TV (no, yes). Households in “sometimes” and “daily” category in the original variable were recoded to “yes” and otherwise they were recoded to “no” in the constructed variable.
Control variables
Based on previous research, several factors that affect the media exposure, tobacco, and alcohol consumption behaviour of individuals were identified. These factors were individual-level demographic characteristics, household socio-economic characteristics, and state-level relevant characteristics. Accordingly, the following control variables were included for analysis—age category of the individual (grouped in interval of 10 years); sex of individual (male, female); level of education of individual (no formal schooling, standard 1–5, standard 6–10, standard > 10); current working status of individual (not working, working); current marital status of individual (unmarried, married); highest educational level of an adult male aged more than 21 years in the household (no formal schooling, standard 1–5, standard 6–10 and standard > 10); highest educational level of an adult female aged more than 21 years in the household (no formal schooling, standard 1–5, standard 6–10 and standard > 10); wealth quintile of household (poorest, poor, middle, rich and richest); caste of the household head (other backward class (OBC), scheduled castes (SC), scheduled tribes (ST), others); religion of the household head (hindu, muslim, others); place of residence (rural, urban); region (central, northern, southern, western, eastern, north-eastern).
Very few of the control variables had observations with missing information. In the first round of IHDS, there were 589 and 699 missing cases in the highest educational level of household female adult and household male adult variables respectively out of 33116 cases. In the second round of IHDS, there were 45, 523 and 707 missing cases in the caste of household head, highest educational level of household female adult and highest educational level of household male adult variables respectively out of 34090 cases. The observations with missing cases for a particular variable were either recoded as “0” or to the “others” category as appropriate. Since the number of individuals with missing information were small relative to the overall sample size, this recoding will not bias the analysis done in this paper.
Analytical Methods
In the beginning, the current study shows the overall distribution of individuals as well as the distribution of individuals who smoke tobacco, chew tobacco and drink alcohol respectively across the relevant demographic, socio-economic and state-level variables used in the dataset. Then the study provided evidence of the presence of unobserved state-level heterogeneity in the data using random intercept logit models with individuals at the first level and the state an individual belongs to at the second level. In both rounds of IHDS, individuals have been grouped under 28 states and 5 union territories and each of such entity (states as well as the union territories) has been considered as a separate state for analysis. There were no individuals who belonged to multiple states and there were no states without any individuals in them. Further, the study provides evidence for the association between mass media exposure in households and tobacco and alcohol consumption behaviour in individuals using multivariable standard logit regression models and random intercept logit regression models, before and after controlling for state-level heterogeneity respectively. Logit regression models were used as the dependent variables are binary.
To give evidence of unobserved state-level heterogeneity, the study estimated a null model (empty random intercept logit model with no explanatory and control variables) and a full model (random intercept logit model with all the explanatory and control variables) for each of the three indicators of tobacco and alcohol consumption. Clustering and heterogeneity in the random intercept logit models was measured using the intra-class correlation coefficient (ICC) and median odds ratio (MOR) respectively. The ICC is the ratio of variation in the risk of tobacco or alcohol consumption across the states (second level units) to the sum of the individual-level and state-level variation (Merlo et al., 2006). In the case of two-level random intercept logit regression models, the individual-level variation is fixed and the ICC is a function of only the state-level variance whose value lies within 0 and 1. The higher the value of ICC the greater is the degree of clustering of observation within the state. The MOR gives the median of the ratio of the propensity of tobacco or alcohol consumption among all pairs of individuals belonging to high-risk and low-risk states (Merlo et al., 2006). The MOR too is a function of the state-level variance whose value is always greater than or equal to 1. A high value of MOR denotes a greater level of heterogeneity in the risk of tobacco or alcohol consumption across the states. To give evidence of the association between media exposure and tobacco and alcohol consumption behaviour this study estimated a standard logit regression model (model without controlling for unobserved heterogeneity) and random intercept logit model (model after controlling for unobserved heterogeneity) for each of the three indicators of tobacco and alcohol consumption. Association in the standard and random intercept logit regression models were shown using odds ratios. The odds ratio for standard logit regression models gives the likehood of tobacco or alcohol consumption among the individuals within media exposure compared to those individuals without media exposure after controlling the effect of the control variables (Cameron & Trivedi, 2005). The interpretation for odds ratio is similar in the case of random intercept logit regression models except that they additionally control for the effect of unobserved state-level heterogeneity (Goldstein, 2011; Snijders & Bosker, 2011).
All the above analytical procedures were followed separately for both rounds of the IHDS dataset. All statistical estimations were performed using the STATA software (StataCorp, 2013).