Setting, design and time –dimension
A cross-sectional study design was employed in Ethiopia’s two largest manufacturing industries (Metehara and Wonji sugar industries). Data extraction was done from December 14th, 2021, to March 23rd, 2022. The study used a retrospective costing approach to estimate the employer’s level of the economic burden of occupational injury. The problem was approached via an analysis of prevalence-based analysis that focuses solely on the costs incurred in a specific year, irrespective of the injury’s date.
Sample size and sampling techniques
The sample size was determined using single population proportion formula with the following assumptions: prevalence of occupational injury (p) = 78.3% [21], with a 5% margin of error at a 95% confidence interval and a design effect of 1.5, the calculated sample size yielded 1,136 respondents. The calculated sample size was proportionally allocated to each industry based on the number of injured cases. Data from administrative injuries records determined the total number of injuries during the reference period (January 1st to December 30th, 2021).
Source of data
The study used data from multiple sources to minimize the limitation of the study on the utilization of workers’ compensation data alone for occupational injury. Finally, similar data were combined and complemented for further analysis.
Data from secondary sources
Bureau of Labor and Social Affairs records to get all the injured cases in the reference period. The study included variables derived from insurance companies, such as the name of the industry, premium payment; the number of injured workers (both non-fatal and fatal); the amount of compensation received, and the day of work missed. The researchers interviewed the security officers, account department officers, occupational health and safety officers, and production managers. The study incorporated the data obtained via a questionnaire with administrative records due to high accident under-reporting.
Data collection techniques and procedures
Data from manufacturing industries
The study gathered industry data such as sick leave, number of injured workers, number of death, the total number of days lost, sick leave pay, and insurance premiums from manufacturing industries via interviews administered questionnaire. The safety officers, plant managers, and industry insurance dealers collected additional data.
Data from injured workers
The study collected the primary data from injured workers who have missed at least one working day to identify factors associated with the total cost variability. The study used a structured, interviewer-administered questionnaire adopted from international labor organization (ILO) injury statistics and modified into the industry context [22].
Techniques and approaches to cost estimation
Estimation of direct cost
The researcher used insurance agency data to estimate the direct costs of injury through the top-down approach (allocating portions of a known total expenditure to each of several injury categories). The median direct costs for medical expenses and compensation costs were calculated in disability categories and multiplied by the total number of injuries. The direct medical and compensation costs were calculated separately. The study adjusted cost currency with international cash, the United States dollar (USD). For 2021, according to the commercial bank of Ethiopia, the average annual exchange rate between the USD and the Ethiopian Birr (ETB) was USD 1.0 = ETB 44.32 ETB [23].
Estimation of the indirect cost of occupation-related injury
The friction cost estimation method estimated the indirect costs of occupation-related injuries. Using a multiplier, the present study accounted for employer productivity losses in absenteeism and presenteeism.
Absenteeism cost estimation method
The study estimated absenteeism lost productivity due to the worker’s absence from work to quantify costs for employers. It was multiplied by the total number of absences from sick days by the median daily wage and the fractional value of the multiplier of absences from lost productivity. The median multiplier is 1.28, suggesting that the employer’s missing work cost is higher than the salary. The cost of absenteeism was calculated giving the following formula:
Formula 1: = 
Where:
MLW = the median of lost workday’s due to absenteeism for the defined period,
NIE = the total number of injured employees,
MDWE = the median daily wage of the employees,
The median multiplier supports the view that the cost to the firm of missed work is often greater than the wage =1.28.
Presenteeism cost estimation method
The researcher estimated presenteeism lost the cost of productivity due to the worker’s reduced job output. The cost of presenteeism was evaluated according to the following formula:
Formula 2: = 
Where:
MLW = the median of lost workday’s due to presenteeism (the problem of workers’ being on the job but, because of illness or other medical conditions, not fully functioning),
NIE = the total number of injured employees,
MDWE = the median daily wage of the employees,
Presenteeism multiplier was used =1.5.
Operational definition
Absenteeism: refers to the productivity lost when someone is absent for at least one day from the workplace because of an injury or illness for which the employee is accountable.
Total cost: referred to the sum of the direct and indirect costs of occupation-related injuries.
Direct cost: referred to expenditures associated with the usage of medical facilities and reimbursement (repayment) for payments made by organizations and insurance providers.
Indirect costs: referred to losses in production due to absence from work.
Friction cost: refers to the approach that measures the indirect cost of injury by estimating the cost of replacing those killed or temporarily or permanently disabled with other existing workers during the friction period.
Occupational injury: referred to any personal injury such as a cut, fracture, sprain, and so forth those results from a work-related event resulting in an absence from work of at least one day.
Perspective (the level of analysis): the point of view from which an analysis was conducted.
Presenteeism: reduced productivity, the performance of employees who work while they are sick or injured, or the practice of coming to work despite illness, injury, anxiety, etc., often resulting in reduced productivity.
Unsafe Act: Performance of a task or other activity conducted in a manner that may threaten the health and safety of workers. It includes improper use of PPE, operating equipment at an unsafe speed, bypassing or removing safety devices, using defective equipment, using tools other than their intended purpose, working in hazardous locations without adequate protection or warning, and improper equipment repair. We asked about 15 questions, and respondents’ yes response scores were 1, and no scores were 0. Then, the proportions of unsafe acts were computed by pooling multiple responses.
Data quality assurance
Data were gathered by trained data collectors (n = 6) with a degree in occupational health and safety. The principal investigator trained data collectors and supervisors for two days. The training focused on the data collection tools, the study procedures, and research ethics. The questionnaires were translated into the Amharic language by an experienced translator and back-translated into English by an independent translator for consistency. A pretest was done on 10% of the sample size outside of the study area before data collection. The principal investigator checked the collected data for completeness and consistency before further analysis. The principal investigator closely monitored the field-level data collection process daily. The investigator approached data collectors when errors were noticed, correcting the field level.
Data processing and analysis
The study was generating a codebook to pass the collected data to a code sheet. The researchers cleaned up the data, did an inspection of distributions, and cleaned contingency for accuracy. Case sorting was executed to find the missing variables. A continuous variable was coded, and it was recorded some coded variables. We provided a non-lapping numerical code. All of the compiled data was recorded in an Excel spreadsheet and exported to the STATA 14 software for further analysis. The data analysis was conducted in a step-wise procedure in which first, the characteristics of study participants were analyzed and described. The direct, indirect, and total costs of occupation-related injuries were analyzed. All cost information was converted to United States dollars (USD$). The study checked multi-collinearity using a correlation matrix at > 0.8, a variance inflation factor (VIF) > 10, and a tolerance of 0.1. It was tested for model fitness using the Hosmer and Lemeshow test at a p-value > 0.05. Next, the cost of occupational injury data was checked for normality using plots (Q–Q plots and histograms) and the Kolmogorov–Smirnov test for normality (P > 0.05). The cost data was discovered to be right-skewed, and a log transformation was performed to confirm the normality of the skewed data. The study employed a generalized linear model (GLM) with a gamma family and log link function to identify predictors of total cost by considering the non-normal distribution of the total cost. Exponentiate coefficients (exp (b) with a 95% confidence interval were used to express the direction and strength of the association. The study took variables with p-values < 0.05 at a 95% confidence level as statistically significant.