Study setting and design
A retrospective study was conducted based on data obtained from Oromia regional State police commission, Bureau traffic control and crime investigation department. The study included reported road traffic accidents from July, 2016 to July, 2017. The study was conducted in Oromia regional state, which the largest region of the country. The major towns of the region include Adama, Ambo, Asella, Bale Robe, Bishoftu, Fiche, Goba, Jimma, Metu, Nekemte, Sebeta, Sululta, Shashemene, and Wolliso among many others [15]. The Oromia region has the highest traffic movement in the country, next to Addis Ababa. The major road from Addis Ababa to Djibouti passes through this region. The all major roads between Addis Ababa city and other regions capital cities will pass this region.
Study population, sample size and sampling techniques
All registered road traffic accidents from July 2016 to July 2017 that were documented in the Oromia regional state police commission Bureau; traffic control and crime investigation department were included in the sample.
Variables of the study
The response variable of this study was the number of human deaths per road traffic accident in Oromia region measured as 0, 1, 2, 3…A human death includes pedestrians, road users, passengers or other peoples dead due to road traffic accident. The potential predictors related to the number of human deaths per road traffic accident adopted from various literature [16-22]were demographic characteristics (sex of driver, age of driver), environmental characteristics (accident time, road condition, environment of accident, weather condition, location of accident, day of accident), vehicle related characteristics (vehicle type, vehicle length of service), road related characteristics (road pavement), driving experience, education level of driver, driver-vehicle relationship, accident cause, and accident type (collision between vehicles, collision between vehicle and pedestrian, collision between vehicle and animal, vehicle upside down).
Data collection process
Data were collected from Oromia regional state police commission Bureau; traffic control and crime investigation department, using a checklist that was prepared based on the road traffic control and registry format.
Statistical data analysis
We used STATA version 13 [23]to analyze data. We illustrated the distribution of data using tables, percentages, mean variance, skewness and kurtosis. Count regression models were used as a method to model the discrete nature of response variable (number of human deaths per road traffic accidents) measured as 0, 1, 2, 3… [24].
Count regression models, unlike linear regression, have counts as the response variable that can take only nonnegative integer values, 0, 1, 2, 3… measured in natural units on a fixed scale representing the number of times an event (number of human deaths per road traffic accident) occurs in a fixed domain. For count data, the standard framework for explaining the relationship between the outcome variable and a set of predictors includes the standard count regression models (Poisson Regression and Negative Binomial (NB) Regression), Zero-inflated models (Zero-inflated Poisson (ZIP) and Zero-inflated Negative Binomial (ZINB)) and Hurdle models (Hurdle Poisson (HP) and Hurdle Negative-binomial (HNB)) were used.
The conventional Poisson regression model for count data is often of limited use because empirical count data set typically exhibit over-dispersion and/or have an excess number of zeros. This problem can be addressed by extending the ordinary Poisson regression model by including the dispersion term in the model. The family of generalized linear models (GLMs) negative binomial regression model was the solution [25, 26]. However, although these models typically can capture over-dispersion rather well, they are in many applications not sufficient for modeling excess zeros. Zero-augmented models was address this issue by capturing zero counts [27,28]. Conversely, Zero-inflated models [27], are mixture models that combine a count component and a point mass at zero. A comprehensive and up-to-date account of count models and methods as well as the interpretations of fitted count models are provided by [29]. Hurdle models [28] combine a left-truncated count component with a right-censored hurdle component.
Poisson Regression Model
Poisson regression model is the most common technique employed to model count data. Consider a group of p predictors denoted by the vector x′ = [x1, x2,…,xp] and let yi represent counts of events (number of deaths) occurring of the random variable Y in a given time or exposure periods with rate μ. Then the probability mass function for a Poison random variable Y is given by;
See formula 1 in the supplementary files.
where𝑦i = 0, 1, 2, 3… are discrete counts (the number of human death per accident) and 𝜇 is the rate parameter [29].
Then the relationship between the predictors the non-negative mean parameter 𝜇i is the exponential specification given by;
E(𝑦i ) = 𝜇i = exp (𝑥i′𝛽)
where𝑥i′ = (1, 𝑥𝑖1, … , 𝑥𝑖𝑝), is a vector of explanatory variables and 𝛽 = (𝛽0,𝛽1, 𝛽2, … ,𝛽𝑝)′ is the corresponding (𝑝 + 1) dimensionalcolumn vector of unknown parameters to be estimated.
The unknown parameters of the model were estimated using the maximum likelihood estimation of the log-likelihood function.
Negative Binomial Regression Model
Given that over-dispersion is the norm, thenegative binomial model has more generalitythan the Poisson model. Over-dispersion is most often caused by highly skewed response/dependent variables or often due to variables with high numbers of zeros [30, 31].The probability mass function of a negative binomial distribution random variable Y is given by;
See formula 2 in the supplementary files.
The mean and variance of NB distribution areE(𝑦|𝜇,𝛿) = 𝜇, and 𝑣𝑎r(𝑦|𝜇,𝛿) = 𝜇(1 + 𝛿𝜇).Where 𝛿 is the dispersion parameter [29].The predictor variables related to the parameter𝜇 through the log-link function defined aslog𝜇 = 𝑥i′𝛽.
Zero-inflated count regression models
In the real life data the major source of over-dispersion is a relatively large number of zero counts and the resulting over-dispersion cannot be modeled accurately with the negative binomial model. In such cases, one may use zero-inflated Poisson or zero-inflated negative binomial models to fit the data [27]. Such models assume that the data are a mixture of two separate data generation processes: onegenerates only zeros, and the other is either a Poisson or negative binomial data-generating process.
Zero- inflated Poisson Regression Model
The excess zeros are a form of over dispersion andfitting a zero inflated Poisson model can account for the excess zeros, but there are also other sources of over dispersion that must be considered. If there are sources of over dispersion that cannot be attributed to the excess zeros, failure to account for them constitutes a model misspecification, which results in biased standard errors.In ZIP models, the underlying Poissondistribution for the first subpopulation is assumed to have a variance that is equal to the distribution’s mean. If this is an invalid assumption, the dataexhibit over dispersion (or under dispersion).The probability distribution of a zero inflated Poisson random variable is given by:
See formula 3 in the supplementary files.
The response variable yi is a non-negative integer, μiis the expected Poisson count for the ith individual; 𝜔iis the probability of extra zeros.The mean and variance of ZIP distribution areE(Yi) = (1 − 𝜔i ) and𝑣ar(Yi) = 𝐸(Yi)(1 + 𝜔i𝜇i ). The parameters and 𝜔i depend oncovariates xi and zi respectively, wherelog( ) = xi'𝛽 and log ( )= zi'γ
Zero-Inflated Negative Binomial Regression Model
Zero-inflated negative binomial regression isoften used for modeling over-dispersed count outcome variables with excessivezeros. Furthermore, theory suggests that the excess zeros are generatedby a separate process from the count valuesand that the excess zeros can be modeledindependently. The probability distribution of a zero inflated negative binomial response variable is given by:
See formula 4 in the supplementary files.
where δ> 0 is an over-dispersion parameter. The meanand variance of the ZINB model are E(𝑌i ) = (1 − 𝜔i ) and𝑉ar(𝑌i) = (1 − 𝜔i )(1 + 𝜔i𝜇i + 𝛿𝜇i )𝜇i. The parameters 𝜇i and 𝜔idepend on vectors of covariates x′ = [x1, x2,…,xp] and 𝑧i, respectively.The method of Fisher scoring is more appropriate to obtain the parameter estimates of ZINB regression models.
Hurdle Models
The concept underlying the hurdle model is that a binomial probability model governs thebinary outcome of whether a count variable has a zero or a positive value. If the value is positive, the "hurdle is crossed," and the conditional distribution of the positive values is governed bya zero-truncated count model [28].Hurdle count models are two-component models with a truncated count component for positive counts and a hurdle component that models the zero counts. The count model is typically a truncated Poisson or negative binomial regression (with log link). The probability mass function of the response variable y in the hurdle model is given by:
See formula 5 in the supplementary files.
whereyiis the value of the dependent variable for the ith person i = 1, …, n ), ziis a vector denoting the number of predictor variables in the zero counts, xirepresents a vector denoting the number of predictor variables in the hurdle part, γ is a vector of coefficients belonging to z, and β denotes a vector of coefficients related to x, f zero is a probability density function at least binary outcome (0, 1) or counts (0, 1, 2, 3…), and f count is a probability density function of counts(0, 1, 2, 3 …).
The regression coefficients of the model are estimated bymaximum likelihood. The fzero part, where yi = 0 is typically modeled with a binary logit (logistic regression) model, where all counts greater than 0 are given a value of one [32].
Using a binary logistic regression model for thispart, the probability of yi= 0 is denoted as;
See formula 6 in the supplementary files.
wherezirepresents the observed data and γ the vector of coefficientsbelonging to zi. Obviously, the probability of a nonzerocount is given by 1 – ψi.The non-zero count part (fcount)is modeled with a truncated (yi>0) count model. This is typically a truncated Poissonmodel or a negative binomial model in case of over-dispersion.