1.1 Study area and period
Ethiopia is situated in the Horn of Africa with 9 Regional States and two city administrations. The capital city of Ethiopia is Addis Ababa. Study was conducted in all nine geographical regions and two administrative cities of Ethiopia which are included by EDHS 2016 [3]. Study period of this study is from May to December of 2018.
The current health policy of Ethiopia gives much more emphasis on prevention and the health promotion components of health care that should be able to resolve most of the health problems of the population [17].
1.2 Data source
The child dataset used for this analysis was the 2016 EDHS. It is the latest and the nationally large scale dataset of demographic and health survey that was conducted by the Central Statistical Agency (CSA) from January 18, 2016 to June 27, 2016 with nationally representative sample from 9 regions and two administrative cities. The total study participant to this study was 2110 infants whose weights were recalled. Details of sampling design and selection of sample are available in the Ethiopia Demographic and Health Survey 2016 EDHS reports [3].
1.3 Study design
A cross sectional study design was used to identify multilevel factors associated with LBW from the 2016 EDHS data collected by the CSA.
1.4 Population
1.4.1 Source population
All live births in the five years born to women’s of reproductive age of 15-49 years who were residents of the nine regions and two administrative cities of Ethiopia during the survey.
1.4.2 Study population
All live births in the five years born to women’s of reproductive age of 15-49 years, who were residents of the selected households in the selected enumeration areas during the survey.
1.4.3 Sample population
All live births in the five years born to women’s of reproductive age of 15-49 years, who were residents of the selected HHs in the selected enumeration areas during the survey, Study population who fulfilled the inclusion criteria.
1.5 Eligibility criteria
1.5.1 Inclusion and exclusion criteria
All Birth Weighted infants in EDHS 2016 sampled areas were included in the study.
All mothers who had not weighted their children and do not know their child’s weight were excluded.
1.5.2 Operational and standard definition
Normal birth weight- is a weight class greater than or equal to 2500 gram.
Low Birth Weight- is a weight at birth less than 2500 gram.
Individual level factors: A variable operating at the lowest level or individual level which included children’s, parents and household characteristics.
Area level factors: The term Area refers to clustering of individuals within same geographic environment.
1.6 Sample size and Sampling procedures
Each region was stratified into urban and rural areas yielding 21 sampling strata. Samples of enumeration areas were selected independently in each stratum in two stages. In the first stage an enumeration areas 645 (202 urban areas and 443 rural areas) were selected. In the second stage of selection a 28 households per cluster were selected with an equal probability systematic selection from the newly created household listing. Overall, 18,008 households were selected of which 17,067 were occupied. Selected households were visited and interviewed. All women aged 15-49 years were eligible to be interviewed. Of all the child dataset related to birth weight, 2110 infants are eligible to this study.
Figure 2: Graphical representation of sampling procedure for low birth weight in Ethiopia from EDHS 2016.
1.7 Data quality control
Data related to the outcome variable of low birth weight was selected and extracted from the child dataset of EDHS 2016. Further data cleaning, labeling, coding and recoding were done for all selected variables. Categorization was done for continuous and categorical variables using information from different literatures accordingly.
1.8 Study variables
1.8.1 Dependent variable
The outcome variable is low birth weight.
The dependent variable for the ith birth weight was represented by a random variable with two possible values coded 1 and 0. So, the response variable of the ith birth weight was measured as a dichotomous variable.
(Yi=1if low birth weight was occurred, otherwise Yi=0)
1.8.2 Independent variables
The explanatory variables were considered at two levels individual and Area level factors.
Individual level variable
Individual level factors considered in this study includes: Age of mother, maternal age at first birth, gestational age at birth, birth order, type of birth, sex of the child, sex of household head, marital status, mother’s educational level, husband/partner of mothers’ educational level, maternal occupation, husband/partner of mothers’ occupation, cigarette smoking, household wealth index, religion, household size, media exposure, body mass index.
Area level variables includes: In addition to region, place of residence, Area media exposure, Area educational status, Area poverty status. The aggregated Area level predictor variables were constructed by aggregating individual level values at cluster level and binary categorization of the aggregated variables were done based on the distribution of the proportion values calculated for each cluster(Area).
1.9 Statistical methods of Analysis
Multilevel Modeling: Nature of nested data makes the uses of traditional regression methods inappropriate because of the assumption of independence among individual within the some group, assumption of equal variance across groups which an inherent in traditional regression methods are violated. Therefore, multilevel model is a type of regression analysis for multilevel data where the dependent variable is more appropriate for hierarchically structured data, such as the DHS to estimate the robust standard error. So in this study multilevel binary logistic regression analysis was employed in order to account for the hierarchical nature of the DHS data and the binary response of the outcome variable.
1.9.1 Data analysis
1.9.2 Descriptive analysis
Frequency and percentage were reported for categorical variables and continuous explanatory variables. In addition cross tabulation was showed the proportion of different categories of each characteristic with respect LBW.
1.9.3 Multilevel Analysis
Bivariate Multilevel Logistic Regression Analysis
Bivariate MLRA was employed to explore association between dependent variable and a wide range of independent variables. Variables with p-value ≤0.25 entered multivariable logistic regression which controls the undesirable effects of confounding variables [18].
Multivariable Multilevel Logistic Regression Analysis
Multilevel Logistic Regression Model was fitted to examine the individual and Area level factors that are associated with low birth weight at p-value of ≤0.25 during the Bivariate Multilevel Logistic Regression Analysis. Variables with p-value of less than 0.05 were considered as significant predictors. The result was presented with odds ratio (AOR) and 95% confidence interval (CI).
Model specification
In this multilevel analysis it has set up of two level models. The level one individual variables and second level is the Area level. The analytical strategy in the case of multilevel analysis consists of four models.
The first model which is usually called the “empty” or “null model” is fitted without explanatory variables. In other words, it contained no covariates, but decomposes the total variance in to individual and Area components. The empty model is used to determine whether the overall difference between communities and individual on LBW were significant.
Y= ln [Pij/1-Pij] = b0j + u0j………………...………………………………………… (1)
In the above equation, Pij is probability of LBW, b0j is the overall regression intercept when all predictors were adjusted to zero and u0j is the residuals at the Area level.
The second model referred to as the “individual model” included individual-level characteristics. This is to allow the assessment of the association between the outcome variable and individual level characteristics. The model containing the individual level variables is used to determine whether the variation across communities could be explained by the characteristics of the individual residing within that Area or not.
Yij = β0j + β1 X1ij+⋯+βn Xnij+ u0j+eij….............................................................................2
In the above model: β0j is the intercept, β1 is the regression coefficient (regression slope) for the explanatory variable, X1ij is number of individual’s level factors and eij is the usual (random error term).
A third model contains only the Area level characteristics to allow the assessment of the impact of the Area level variables on the outcome variables.
Yij = βoj +β1Z1 + …+βnjZnj+eij + unj ………………………….………….3
Each cluster has different intercept βoj, slope coefficients β1, Znj is number of Area level factors, and unj random residual error terms at the cluster level.
Lastly a fourth model generated which is called “final model”. This includes explanatory variables at both the individual and Area level simultaneously. The final model is used to test for the independent effect of Area contextual variables above and over the individual variables. The simultaneous inclusion of both individual and Area level predictors in the multilevel logistic regression model permits: (1) the examination of Area effects after individual level confounders have been controlled for: (2) the examination of individual level characteristics as modifiers of the Area effect (and vice versa); and (3) the simultaneous examination of within and between Area variability in outcomes, and of the extent to which between Area variation is explained by individual and Area level characteristics.
The formula for the final models expressed as:
Log [Pij / (1-Pij)] = boj +b1X1ij + b1Z1j+…..+ uoj+ eij
-
Pij is the probability of LBW ith birth weight in the jth Area
-
b0j is the log odd of the intercept
-
b1,b...bnj are the regression coefficient estimate the data
-
X1ij,…Xnij are the covariates (independent variables) which may be defined at the individual level
-
Z1j,…Znj are the covariates (Area variables) which may be defined at the Area level
-
uoj are random error at the Area level
-
eij are random error at the individual level
Parameter Estimation Method
The parameters that have to be estimated are the fixed coefficients b0, b1, etc and the random parameters s2u0.
In the multilevel model, fixed effects (measure of association) refer to the individual and Area covariates and expressed as Adjusted Odds Ratio (AOR) and 95% confidence interval. The random effects are the measure of variation with LBW across communities. The ratio of the variance at the Area level to the total variance is referred to as the intra-class correlation coefficient (ICC). The precision is measured by the standard error (SE) of the independent variables [19].The result of random effects (which are the measure of variation) are expressed as Variance Partition Coefficient (VPC) (which in this study is equal to ICC), and proportional change in variance (PCV). As a result of the dichotomous nature of the outcome variable in the study, the VPC calculated based on the linear threshold model method which converts the individual level variance from the probability scale to the logistic scale, on which the Area level variance is expressed [20]. In the other words, by using the linear threshold model, the unobserved individual outcome variable follows a logistic distribution with individual level variance s2e equal to p2/3 (=3.29). In this case, the VPC corresponds to the ICC, which is a measure of general clustering of individual outcome of interest in the communities.
The ICC is calculated as: ICC = (s2u/(s2u+ p2/3)
ICC is the proportion of Area variance out of the total variance (Area plus individual variance) s2u is the variance of the Area level, p2/3 =3.29 and represent the fixed individual variance.
Area differences with LBW may be attributable to contextual influences or differences in individual composition of communities (including unobserved individual characteristics) [47]. In view of this, while adjusting for the individual characteristics in the multilevel model, same part of the compositional differences were taken in to consideration to explain some of the Area difference observed in the empty model. Thus the equation for the proportional change in Area variance is:
PCV1= (VN1-VN2)/VN1
VN1 –is the Area variance in the empty model and VN2 is the Area variance in the model including either individual level characteristics or Area level characteristics or both individual and Area level characteristics [20].
The Wald test was used to test the null hypothesis that a parameter value is zero or that a group of parameters are jointly zero. The latter case applies when testing the significance of categorical variable. Linear functions of parameters can also be tested. If the null hypothesis is true, the test statistic is distributed as approximately x2 with r degrees of freedom, where r is the number of functions that are being tested [21]. Hence, the significance of freedom variation at each level will be tested with the Wald test, and p- values <0.05 were considered to be significant to reject the null hypothesis.
Model diagnostics
Multi-collinearity diagnostic evaluation was done using variance inflation factor (VIF), and thus value of VIF greater than 10, gives evidence of multi-collinearity. Interaction effects were assessed between individual and Area level explanatory variables [22].
Model fit statistic
Receiver operating characteristics (ROC) curve was used to assess general accuracy of the model to the data set using the area under receiver operating characteristics. ROC curve is a commonly used measure for summarizing the discriminatory ability of a binary prediction model.
Relative goodness-of-fit tests were conducted using both Akaike’s information criteria (AIC) for each of the models and compared. In general, it might be best to use AIC together in model selection. This was done because AIC was the appropriate selection in multilevel analysis than other methods. Since multilevel data have a different sample size at different levels and compared to AIC [23].