Data and Variables
Data for this study are obtained from the four rounds of National Family Health Survey (NFHS), an Indian variant of Demographic and Health Surveys (DHS), which were carried out between 1992-93 and 2015-2016 by the IIPS, Mumbai; ORC Macro; Macro International Inc and ICF [1, 41-43]. It is imperative to note that although the state of Bihar was reorganized in 2000, data were culled out for the second round (1998-99) for the districts representing present-day Bihar using district codes to make it comparable with third round (2005-06). During 1992-93, in undivided Bihar, the survey collected information of 3,575 children born during the four years preceding the survey. During 1998-99, 2005-06 and 2015-16 information of 2,948; 2,320 and 3,679 children were collected respectively. During 1998-99 and 2005-06 information were collected for the children born during three years preceding the survey, while such duration was for five years during 2015-16. For this reason, the study has been restricted for the children of age group 0 – 36 months in order to compare childhood stunting over the four rounds of NFHS. It may be noted that the present study intends to compare changes of covariate and coefficient effects between two successive rounds such as between NFHS 1 and NFHS 2; NFHS 2 and NFHS 3; and; NFHS 3 and NFHS 4, and not over the rounds, for instance, between NFHS 1 and NFHS 4.
Stunting has been defined as height-for-age Z scores (HAZ) less than minus two standard deviation of the WHO International Reference Standard . It is universally considered as a standard indicator of child undernutrition and health status as it reflects chronic undernutrition caused by long-term deprivation. A child’s height-for-age is a measure of their height, relative to a healthy standard population of the same sex and the same age-in-months. Height-for-age is measured using z-scores, meaning that it is expressed as a difference between the height of the observed child and the average height of healthy children, scaled by the standard deviation of child height in the healthy population. A child with a height-for-age z-score (HAZ) of zero would be as tall as the average child in the healthy reference population; a child with a negative height-for-age z-score is shorter than the average child in the healthy reference population. The formula for calculating the HAZ score is
Complete information on HAZ score was available for 1,821; 1,627; 1,188; and, 2,184 of aged 0-36 months in the four respective rounds. HAZ has been used as outcome variable in all the regression models. The study has included current age of the child (in months), square of the age, sex of the child (male, female), size of the child at birth (more than average, average, small) as a proxy for birth weight, initiation of early breastfeeding (no, yes), and number of siblings as child characteristics. Receipt of any services from ICDS during 12 months preceding the survey was included while comparing NFHS 3 and 4 because such information was available only in these rounds. Maternal characteristics comprises age of the mother at first birth (in years), maternal education (in completed years), work status (working, not-working), degree of media exposure (additive index of three binary variables – reading newspaper, watching television, listening radio at least once a week). Institutional delivery (no, yes) was considered as a proxy of contact with health personnel by mother. Maternal height and maternal BMI, and anaemia (no, mild, moderate and severe) were included for analyses of second, third and fourth rounds of NFHS because such information were not collected in the first round. Similarly, normalized factor scores of variables indicating household decision making, freedom of movement etc. were incorporated as maternal level variable in second, third and fourth rounds of NFHS (see endnote 1).
Household wealth index, religious category (Hindu, Muslims/others), membership to social group (scheduled castes (SC), scheduled tribes (ST), Others) were incorporated as household level variables. It may be mentioned here that the first round of NFHS did not collect data on ‘other backward castes’ (OBCs) and thus categorised in ‘Others’. Household wealth index as calculated by DHS is based on possession of household durable assets, availability of safe drinking water and sanitation, and handholding. For construction of index, the variables were first broken into sets of dichotomous variables and indicator weights are assigned using principal component analyses (PCA) as suggested by Filmer and Pritchett . In addition to the variables representing child, maternal and household characteristics, place of residence (rural/urban) was also included in the regression models.
To assess the differentials in HAZ scores over the study period, first, the distribution of the HAZ scores of Bihar’s children was estimated separately in each survey period using kernel smoothing techniques and period-wise differentials were computed at each quantile to provide raw difference in HAZ scores across distribution.
One of the primary objectives of the present study was to decompose the period-wise differences in child’s HAZ scores in covariate effect, i.e. the differences in HAZ scores arising out of the differences in levels of characteristics or composition of the children in the survey-period; and the coefficient effect, i.e. the differences in HAZ scores were caused by the differences in the returns to those characteristics or structure, across the entire HAZ distribution. It is worth mentioning that majority of the earlier studies have largely modelled the nutrition outcomes (such as HAZ scores) at the mean level by using ordinary least square (OLS), or the prevalence of stunting, underweight or wasting by using logit or probit regression approaches. Limitation of these approaches is related to the fact that changes in covariates and the effect of covariates is constrained to be same along the entire distribution of outcome variable. Further, decompositions based on OLS would apply only to the period-wise mean differences in HAZ scores; however, not to other distributional characteristics, such as quantiles.
Quantile regression (QR) method, developed by Koenker and Bassett, allows effects of covariates to vary across the entire distribution of continuous response variable . Limitation of this model is that it estimates only the conditional quantile effects of changes in covariates. In the present study, we were interested to estimate the effect of policy intervention, for instance, mother’s education in a population of individuals with different characteristics (i.e. unconditional effects) rather than in the impact for sub-groups with specific values of covariates (i.e. conditional effects). Unconditional recentred influence function quantile regression developed by Firpo et al. to assess the unconditional quantile effects of changes in covariates was employed in the present study . The method consists of employing a regression of a transformation – the recentered influence function (RIF) – of the dependent variable (Y) on the explanatory variables (X). Advantage of this method is that it allows estimating the contribution of each explanatory variable for the components of the HAZ decomposition and thus extends the Blinder and Oaxaca decomposition to other distributional statistics than the mean . The rationale behind application of such quantile regression based counterfactual decomposition (QR-CD) approach would be strengthened if there are important differences across the HAZ distribution in the relative contributions of covariate and coefficient effects to period-wise changes.
To estimate the unconditional quantile regression, first we have derived the RIF of the response variable (HAZ score, in our case). The RIF for the ꞇth quantile is given by the following expression:
Where fY(qꞇ) is the marginal density function of Y at the point qꞇ estimated by kernel methods; qꞇ is the sample quantile; I (Y≤ qꞇ) is an indicator function indicating whether the value of the outcome variable is below qꞇ. RIF provides a linear approximation to a non-linear functional (ν(Y)) (such as median) of the Y distribution and thus allow computing partial effects for single covariates . Firpo et al. have also shown that by estimating OLS of the new dependent transformed variable on the covariates (X), the RIF quantile regression may be implemented . In case of this study, considering two periods (t1 and t2), RIF regressions for HAZ score in both periods are estimated as:
E [RIF (Yiϵg ; qτ ) | Xiϵg ] = X i, g βτ, g g= t1, t2 (2)
Coefficients βτ, g represents the approximate marginal effects of the predictor variables on the HAZ quantile qτ for children age 0-35 months in periods g = t1, t2.
Although the current research started with reduced form of conceptual framework of UNICEF , a further refinement of covariate set was required since decomposition of observed HAZ differences into covariate and coefficient effects require well-specified regressions models which should include key relevant covariates . The final regression models include following covariates representing child, maternal, household and spatial characteristics as mentioned in the preceding section.
To note, we have tried our best to minimize endogeneity problems, and is consistent with previous literature [29-30], though endogeneity could persist and can lead to difficulties in parameter interpretation. However, as O’Donnell et al. noted that objective of the counterfactual decomposition is not solely causal identification, but to explain variations in child’s HAZ and decide the relative values of covariate and coefficient effects . One should cautiously interpret the coefficients of variables that are potentially endogenous; however, the decomposition itself remains valid.