Sources of data
EDHS 2016 is the fourth Demographic and Health Survey (DHS) of Ethiopia. It was implemented by the Central Statistical Agency (CSA) of Ethiopia and other stakeholders; processed and organized by ICF International into different datasets (13). The authors obtained these public datasets from the MEASURE DHS website by their permission. The standard protocols and three types of tools were used for collecting DHS data, namely the man’s, the women’s, and household questionnaires. Further, the standardization of the questionnaires was also done by governmental and nongovernmental shareholders to maintain the validity of the tools and datasets (12).
Study population and sampling procedures
Ethiopia has nine regions and two administrative towns. Regions are subdivided into Zones, and Zones are also subdivided into administrative units known as Weredas. Each Wereda is further subdivided into the smallest administrative units, called Kebele. During the 2007 census, each kebele was subdivided into census enumeration areas (EAs), which were convenient for the implementation of the census and subsequent surveys (9, 13).
EDHS 2016 followed a two-stage sampling design with stratification into urban and rural. At the first stage of the sampling, 645 EAs, 443 from rural and 202 from urban were selected based on the 2007 Ethiopian population and housing census sampling frame called EA (13). For the women’s questionnaire, women aged from 15-49 in the selected EAs of the selected households were eligible. The second stage of the sampling involved selection from the complete listing of households in each selected EAs by a probability proportionate to the size (based on the 2007 Population and Housing Census) of each cluster (9, 13). Approximately, 28 households from each cluster (giving a total of 18,008 households of which 16,650 households were interviewed). A total of 16,583 eligible women and of them, 15,683 women were interviewed. From the total number of 34,596 newborns included in EDHS 2016, only 2,110 were weighted at birth (13). Therefore, 2,110 newborns were included in this study for analysis.
Variables description
The authors assessed the birth weight of the newborn baby as a dependent variable, which is classified as macrosomic (weights more than 4,000 grams) or not. The independent variables considered for analysis include; the region, residence, sex of newborn baby, mother’s age, mother’s education level, marital status, father’s education level, socio-economic class, mother’s BMI, gestational age (in weeks), and type of birth (singleton or multiple birth).
Method of data analysis
Extraction of variables, data exploration, cleaning, coding, and recoding, descriptive statistics were performed using IBM SPSS version 23, whereas the inferential part of our analysis were done using STATA version 15.
We deliberate the application of the multilevel logistic regression model to assess the regional variation, and to identify the factors associated with high birth weight. This is because, the DHS surveys often follow a hierarchical data structure are based on multistage stratified cluster sampling (9, 13), and hence our data might be correlated within regions. If we apply the classical logistic regression model for this data, the standard errors might be underestimated, because the model does not take into account for the similarity observations within the same region. This underestimation of standard errors results an inflation if type I error rate, which implies a higher possibility of concluding that obtained results are significant, even though they may not be (14).
Multilevel models were developed to correct the dependency of observations within a cluster and to assess interclass correlation (the amount of dependency among individuals) (15-16). Multilevel models incorporate random components of cluster effects in the statistical model. The consideration of random effects at the cluster level in the multilevel model makes it possible to estimate correct standard errors. In multilevel models, dividing the total variance in the dependent variable into between-cluster and within-cluster parts, the variability of random effects across clusters and the importance of clusters can also be evaluated (16-18). Again, both observation level and cluster level covariates can be included in multilevel models. Also, multilevel models separate the estimated effects in the covariates into different levels, which can be interpreted as observation level effects (i.e., within a cluster) and cluster level effects (i.e., across clusters), respectively (14, 16).
Thus, the authors apply the multilevel logistic regression model that takes into account the correlation of individual within the region. The hierarchy for this study follows individuals as level-1 and regions as level-2.