Nigeria is made up of 36 states and a Federal Capital Territory, grouped into six geopolitical zones. It is the most populous country in Africa and yet the 14th largest in land mass [23]. An implication that the country has high population density. The 2006 Population and Housing Census conducted in Nigeria placed the country’s population at 140,431,790, but the 2019 projection based on the 2006 census figure as the base year was above 200 million [24]. Nigeria is a multi-ethnic country with the three major ethnic groups being; Hausa/Fulani, Igbo and Yoruba. Cultural practices among all the ethnic groups in Nigeria favour childbearing. However, the magnitude of fertility varies due to cultural diversities and differential in some sociocultural factors that influence fertility. Polygamy is very common among the Muslims and early marriage is prevalent in the Northern part of Nigeria. Nigeria is being considered as developing nation characterized by high poverty rate and low literacy level.
Study design and data
The study was cross-sectional in design and based on a nationally representative sample across all the six geopolitical zones in Nigeria. Data were analysed based on 2003, 2008, 2013 and 2018 rounds of Nigeria Demographic and Health Survey datasets. The 2003 NDHS programme made use of the sampling frame designed for the 1991 population census while the sampling frame designed for the 2006 population and housing census was used for 2008, 2013 and 2018 NDHS but with modification due to expansion in the number of households between the census period and the survey year. The primary sampling unit (PSU) as defined in all the survey rounds was a cluster tagged as the Enumeration Areas (EAs) from the 1991 and 2006 EA census sampling frames. Samples for the 2003 and 2008 surveys were selected using stratified two-stage cluster design consisting of 365 clusters in 2003 NDHS [25] and 888 clusters in 2008 NDHS [26]. While 2013 and 2018 NDHS was conducted at three and two stages respectively. For the 2013, at the first stage, 893 localities were selected with probability proportional to the size and with an independent selection from each sampling stratum. In the second stage, one EA was randomly selected from most of the selected localities. In a few larger localities, more than one EA was selected. In total, 904 EAs were selected. After the selection of the EAs and before the main survey, a household listing operation was carried out in all of the selected EAs [27]. For 2018 NDHS, at the first stage, 1400 EAs were selected; and a household listing which served as sampling frame was conducted in the selected EAs. In the second stage, 30 households were selected from each cluster by an equal probability of systematic sampling [23].
Data collected were highly comparable over time because of the standardization in sampling procedures, data collection methodologies and coding. The number of households interviewed in 2003, 2008, 2013, and 2018 was 7864, 34070, 40680 and 42000 respectively. The number of women aged 15-49 years interviewed for these year periods used in the study is given as 7620, 33385, 38948, 41821 respectively. Data analyses were based on this secondary data assessed on the web platform of the data originators. The analytical approach was used for further analyses in relation to the study objectives.
Sample weights were applied to each case to adjust for differences in the probability of selection. Weighting is important in to increase the extent of representativeness in the sample, and it reduces the errors associated with sample selection bias.
Variable Description
The dependent variable was fertility measured by the total number of children ever born (CEB). CEB is the lifetime fertility were obtained from information provided by women aged 15-49 years on their full birth history. It is a discrete number in DHS data set. However, it was categorized into two in this study as low if a woman has less than 5 children and high if otherwise. The categorization was based on the 1988 population policy revised in 2004 which emphasized the need to maintain four children at family level [28].
The main explanatory variable was women education. We used literacy to denote women education in this study because it is essential in measuring a population’s level of education. Literacy is defined as the ability to both read and write a short, simple statement about one’s own life. We, therefore, categorized education as having no formal education (Illiterate) for those who cannot read and write and have not completed primary education while educated (if they can read and write and have a minimum of completed primary education - Literate). Other variables used were: maternal Age (15-19, 20-24…, 45-49), age at first birth, age at first marriage, place of residence (urban, rural), religion, modern contraceptive. The age at first marriage (v511) and age at first birth (v212) were count data in years. We re-categorized them as < 20 year (teenagers) and ≥ 20 year. Religion was categorized as Christian, Islam and others. Likewise, the wealth index was re-grouped as poor, middle and rich.
Statistical analyses
Descriptive statistics were used to describe the distribution of respondents by explanatory variables. The difference in mean CEB between uneducated and educated women was examined using Mann-Whitney test due to the skewness of the CEB. We used direct method to produce the total fertility rates (TFR) by educational status. The method has been published elsewhere [29]. Also, we calculated parity progression ratio (PPR) as the proportion of women who move from one parity to the next higher parity for illiterate and literate women. We computed the risk difference in high fertility (≥5) between women who were educated and those who were uneducated. A risk difference (RD) greater than 0 (RD > 0) suggests that high fertility was more prevalent among women with no formal education (pro-illiterate inequality). Conversely, a negative risk difference indicates that high fertility was prevalent among educated women (pro-literate inequality). Finally, we used logistic regression method to conduct the Blinder-Oaxaca decomposition analysis [30, 31]. We chose this method because it allows quantification of the gap between the “advantaged” and the “disadvantaged” groups.
Blinder-Oaxaca decomposition assumed that y is explained by a vector of determinants, x, according to logistic regression model

Where the vectors of β parameters include intercepts.
The gap between the mean outcomes yq and yp, is

Where xq and xp are the explanatory variables at the means for the p and q. In this study the explanatory variables were maternal age, educational status, religion, wealth index, place of residence, age at first marriage, age at first birth, ever used modern contraceptive.
If there are just two x’s, x1 and x2
It can be written as follows:

The gap in y between p and q can be said to be
- differences in the intercepts (G0)
- differences in x1 and β1 (G1)
- differences in x2 and β2 (G2)
To estimate the overall gap or the gap specific to any one of the x’s is attributable to differences in the x’s
The gap between the two outcomes were expressed as:

The gap in the mean outcomes was from a gap in endowments (E) {the part that is due to group differences in the magnitudes of the determinants of the outcome}, a gap in coefficient (C) { the part that is due to group differences in the effects of these determinants}, and a gap arising from the interaction of endowments and coefficients (CE).