2.1 Source of Data
The data for this study was extracted from the published reports of Ethiopian Demographic and Health Survey [9]. The 2016 Ethiopia Demographic and Health Survey (EDHS) is designed to provide data to monitor the population and health situation in Ethiopia. EDHS 2016 is the fourth Demographic and Health Survey conducted in Ethiopia since 2000. The objective of the survey is to provide reliable estimates of fertility, marriage, sexual activity, reproductive preferences, family planning methods, breastfeeding practices, nutrition, childhood and motherhood, mortlity, maternal and child health, HIV/AIDS and other sexually transmitted infections (STIs), women's empowerment, female genital mutilation/mutation, and domestic violence without Program managers and decision makers can use to evaluate and improve existing programs.
2.2 Sampling Design
The sampling framework used for EDHS 2016 was the Ethiopian Population and Housing Census (PHC), conducted in 2007 by the Central Statistics Authority (CSA) in Ethiopia. The Census Base is a comprehensive list of 84,915 census tracts (EAs) created for the 2007 PHC. Administratively, Ethiopia is divided into nine geographic regions and two administrative cities. The 2016 EDHS form is designed to provide estimates of key indicators for the country as a whole, for separate urban and rural areas, and for each of the nine regions and two administrative cities.
The 2016 EDHS sample was stratified and selected in two stages. Each region was stratified into urban and rural areas, yielding 21 sampling strata. Samples of EAs were selected independently in each stratum in two stages. Implicit stratification and proportional allocation were achieved at each of the lower administrative levels by sorting the sampling frame within each sampling stratum before sample selection, according to administrative units in different levels, and by using a probability proportional to size selection at the first stage of sampling.
In the first phase, a total of 645 sites (202 urban areas and 443 rural areas) were selected with a probability proportional to the size of the site (based on the 2007 PHC) and with the selection of the site selected independently in each stratum sampling. Household census activities were conducted in selected regions from September to December 2015. The list of households obtained was therefore used as a sampling frame to select households at the second level. All women aged 15-49 who were permanent residents of selected households or guests staying in the household the night before the survey were eligible to be interviewed. In the interviewed households, 16,583 eligible women were identified for individual interviews; interviews were completed with 15,683 women, yielding a response rate of 95 percent [9].Top of FormBottom of Form
2.3 Variables in the Study
The Response Variable
The response variable is the time to first birth, which is measured in years. For analysis, those women gave birth event coded 1 (success) and those who did not give birth 0 (censored).
Explanatory Variables
Several predictors were considered in this study to investigate the determinant factors of age at first birth. These were education, region, religion, work status, wealth index, place of residence, age at first marriage, age at first sex, and use of contraceptive.
2.4 Methods of Data Analysis.
The Survival Model
Survival analysis is a set of statistical data analysis procedures where the outcome variable of interest is the time until an event occurs. By time we mean the year, month, week or day from the start of tracking an individual until an event occurs; alternatively, time can refer to an individual's age when an event occurs. By case we mean mortality, morbidity, relapse into remission, recovery (e.g. return to work), or any other specified experience of interest that may occur with a individual. The problem of analyzing data over time arises in several application areas such as medicine, biology, public health, epidemiology, engineering, economics, sociology, demographics, etc. The terms lifetime analysis, duration analysis, event history analysis, failure-time analysis, reliability analysis, and transition analysis refer essentially to the same group of techniques, although the emphases in certain modeling aspects could differ across disciplines [10].
The use of survival analysis, as distinct from the use of other statistical methods, is more important when some subjects lose follow-up time or when the observation period ends, and some patients may not know the event of interest during the study period. In the second case, we cannot have complete information about these people. These incomplete observations are believed to have been censored. Most existential analysis considers a major analytical problem to be censorship. Basically, censorship occurs when we have information about the survival time of an individual, but we do not know the exact survival time. Such an event can occur due to either; one did not experience the event until the end of the study, one was lost to follow-up for the duration of the study, and one withdrew from the study for unknown/known reasons. There are three types of censorship.
- Right censoring: Survival time is said to be right censored when it is recorded from its beginning to a defined time before its end time. This type of censoring is a commonly recognized during survival analysis and considered in this study.
- Left censoring: Survival time is said to be left censored if an individual develops an event of interest prior to the beginning of the study.
- Interval censorship: Duration is said to be interval censorship when it is known only that the event of interest occurred during a period of time, but the exact time of its occurrence is not known.
2.4.1 Cox PH Regression Model
[11] Proposed a semi-parametric model for the hazard function that allows the addition of covariates, while keeping the baseline hazards unspecified and can take only positive values. With this parameterization, the Cox hazard function is
$${h}_{i}\left(t\right)={h}_{o}\left(t\right)\text{e}\text{x}\text{p}\left({X}_{i}^{T}\beta \right)$$
2.1
Where, \({h}_{o}\left(t\right)\) is the baseline hazard function which is obtained when all 𝑋′𝑠 are set to zero; Xi is a vector of covariates and β is a vector of parameters.
In this model, no distributional assumption is made for the survival time; the only assumption is that the hazard ratio does not change over time (i.e., proportional hazard model). Even though the baseline hazard is not specified, we can still get a good estimate for the regression coefficients, β, hazard ratio, and adjusted hazard curves.
The hazard ratio of two individuals with different covariates \(X\) and \({X}^{*}\) is given by:
\(\widehat{HR}=\frac{{h}_{o}\left(t\right)\text{e}\text{x}\text{p}\left(\widehat{{\beta }^{\text{'}}}X\right)}{{h}_{o}\left(t\right)\text{e}\text{x}\text{p}\left(\widehat{{\beta }^{\text{'}}}{X}^{*}\right)}=exp\left\{\sum \widehat{{\beta }^{\text{'}}}\left(X-{X}^{*}\left)\right\}\right.\right. (\) 2.2)
This hazard ratio is time-independent, that is why this is called the proportional hazards model. The parameter of the Cox proportional hazard model refers to the hazard ratio of one group in comparison to the other groups for categorical covariates and the change in hazard ratio with a unit change of the covariate for the continuous variables when other covariates are fixed.
The change in hazard ratio for the continuous covariate is given by:
\(\frac{{h}_{i}\left(t, {x}_{k}+1\right)}{{h}_{k}\left(t, {x}_{k}\right)}\) =\(exp\)(\({\beta }_{k})\) (2.3)
which represent the change in the hazard when there is a unit change in the covarite while other covariates keep constant.
For categorical explanatory variable X with levels, the model contains (a-1) dummy variables defined as Di = 1, if x = i, and 0 otherwise for i = 1,2,…,a -1. Let \({\beta }_{1}, {\beta }_{2},\dots , {\beta }_{a-1}\) denote the coefficient of the levels of dummy variables. The ratio of the hazard of two subjects, one with X at level j and the other with k (j,k = 1,2,…, a-1), provided that the values of all other explanatory variables for this subject are the same, the hazard ratio between these two categories is given by:
\(\frac{h\left(t{⃓D}_{j}\right)}{h\left(t⃓{D}_{k}\right)}\) = \(\frac{\text{e}\text{x}\text{p}\left({\beta }_{j}\right)}{\text{e}\text{x}\text{p}\left({\beta }_{k}\right)}=\)exp(\({{\beta }_{j}-\beta }_{k})\) (2.4)
The quantity exp(\({{\beta }_{j}-\beta }_{k})\)100% signifies the ratio of the hazard function for the subject at level j and k of covariates, given that the effect of other covariate keeps fixed.