Study setting and period
The study was conducted in Ethiopia, which is located in the North-eastern (horn of) Africa, lies between 30 and 150 North latitude and 330 480 and East longitudes.
This study used the EDHS 2016 dataset which was conducted by the Central Statistical Agency (CSA) in collaboration with the federal Ministry of Health (FMoF) and the Ethiopian Public Health Institute (EPHI). Data were accessed from their URL: www.dhsprogram.com by contacting them through personal accounts after justifying the reason for requesting the data. Then reviewing the account permission was given via the email. A cross-sectional study design using secondary data from 2016 Ethiopian demography and health survey was conducted. All female (15–24 years old) were included irrespective of their sexual act. A total of 6143 weighted 15–24 years old female were include. The weight is generated based EDHS suggestion as follows: (weight = v005/1,000,000).
EDHS 2016 sample was stratified and selected in two stages. In the first stage, stratification was conducted by region and then each region stratified as urban and rural, yielding 21 sampling strata. A total of 645 (202 urban and 443 rural) enumeration areas (EAs) were selected with probability proportional to EA size in each sampling stratum. In the second stage affixed number of 28 households per cluster were selected with equal probability systematic selection from the newly created household listing.
In this study the outcome variable (early sexual initiation) was dichotomized as (yes/no). Youth who start sexual act at and before 18 years old were consider as having early sexual initiation and those who start sexual act after 18 years old and not started yet during their youth time were considered as not having early sexual initiation which was generated from constructed EDHS-2016 variable. The independent variables were individual level variable including (age, religion, chat chewing, drinking alcohol, wealth index, educational status, media exposure) and community level variables which were created by taking aggregate measures from individual level variables in each cluster (region, residence, community level education, community level wealth index, community level television exposure and community level radio exposure). The community level wealth index is generated by using the proportion of the two (poorest and poorer) lowest level of wealth index to the total wealth index of the same cluster. Similarly community level of education is generated by using the proportion of the two (no education and primary education) lowest level of educational attainment to the total educational level of the same cluster. Community level of television exposure is also computed by dividing not exposed at all to television for the total television exposure, Community level of radio exposure is computed by dividing not exposed for radio at all to the total radio exposure. Since all the above four variables are not normally distributed we were using median as cutoff point (Above median: female youth live in a cluster with high proportion of poor, low educational status, low media exposure community) to dichotomize the variables.
Data processing and analysis
Data cleaning was conducted to check for the consistency with the EDHS-2016 descriptive report. Recoding, variable generation, labeling and analysis were done by using STATA/SE version 14.0.
Descriptive statistics were done to describe the study participants in relation to socio-demographic characteristics which were presented in tables and text. Sample weight (gen wt = v005/1000, 000) was used to compensate the unequal probability of selection between the strata that were geographically defined and for non-responses. Multilevel analysis was conducted after checking the data was eligible to multilevel analysis (by using intra-cluster correction coefficient. When the ICC is greater than 10% (ICC = 22.5%) the community level factors affects the dependent variable. There for it is better to identify community level factors to develop and take different interventions. Since EDHS data are hierarchical (individual “level 1”were nested with in community “level 2”), a two level mixed effects logistic regression model was fitted to estimate both independent (fixed) effects of the explanatory variables and community –level random effects on early sexual initiation among 15–24 years old female. The log of the probability of early sexual initiation was modeled using a two level multilevel model as follows: Log = β0 + β1Xij+ B2 Zij + µj + eij
Where I and j are individual level (1) and community level (2) unites respectively; X and Z refers to individual and community level variables respectively; is the probability of early sexual initiation for the ith youth in the jth community; β’s indicates the fixed coefficients. (Β0) is the intercept, the effect on the probability of early sexual initiation in the absence of influencing factors; and µj showed the random effect ( the effect of the community on early sexual initiation of the jth community) and eij showed random errors at individual level. By assuming each community had different intercept (Β0) and fixed coefficient (β), the clustered data nature and intra and inter community variations were taken into account.
During analysis first, bivariable multilevel logistic regression was fitted and variables with p value less than 0.2 at model I and model II were selected to develop the 3rd model (the final model). The analysis was done in four models. The first model was, model-0 (empty model or null model/ without explanatory variable; to secure the need to multilevel analysis). The second model was, model-I (analyzing only individual level variable), the 3rd model was, model-II (analyzing only community level variable), the last model, model-III (analyzing both community level and individual level variables based on the cutoff point).
The measure of association (fixed effects) estimate the association between the likelihood of early sexual initiation among female youth and different explanatory factors were expressed by Adjusted Odds Ratio (AOR) with respective 95% confidence level. Variables with p- value less than 0.05 at model-III were significantly associated with early sexual initiation. The random-effects (variations) were measured by using ICC (model-0), Median Odds Ratio (MOR) in (model-I and II) and Proportional Change in Variance (PCV) was measured to show variation between clusters.
ICC shows the variation in early sexual initiation among female youth due to community characteristics. The higher the ICC, the community characteristics are more relevant to understand individual variation for early sexual initiation. It is calculated as: , where𝝳2 indicates estimated variance of clusters.
MOR is the median value of the odds ratio between the area at highest risk and the area the lowest risk when randomly picking out two areas and it was calculated as: MOR = exp. ( )≈ exp(0.95𝝳). In this study, MOR shows the extent to which the individual probability of early sexual initiation for female youth determined by place of residence. PCV measures the total variation attributed by individual level variables and area (community) level variables in the final model (model-III).
It is calculated as
of the null model is used as reference.
Multicollinearity was checked among explanatory variables by using standard error at cutoff point ± 2. There is no Multicollinearity that is the standard errors were between ± 2. The log likelihood test was used to estimate the goodness of fit of the adjusted final model (model-III) in comparison to the preceding models (model-I and model-II) individual and community model adjustments respectively.