Quality of Routine Health Information System Data and Associated Factors Among Departments in Public Health Facilities of Harari Region, Ethiopia


 Background: Despite the improvements in the knowledge and understanding of the role of health information in the global health system, the quality of data generated by a routine health information system is still very poor in low and middle-income countries. There is a paucity of studies as to what determines data quality in health facilities in the study area. Therefore, this study was aimed to assess the quality of routine health information system data and associated factors in public health facilities of Harari region, Ethiopia.Methods: A cross-sectional study was conducted in all public health facilities in Harari region of Ethiopia. The department-level data were collected from respective department heads through document reviews, interviews, and observation check-lists. Descriptive statistics were used to data quality and multivariate logistic regression was run to identify factors influencing data quality. The level of significance was declared at P-value <0.05. Result: The study found a good quality data in 51.35% (95% CI, 44.6-58.1) of the departments in public health facilities in Harari Region. Departments found in the health centers were 2.5 times more likely to have good quality data as compared to departments found in the health posts. The presence of trained staffs able to fill reporting formats (AOR=2.474; 95%CI: 1.124-5.445) and provision of feedback (AOR=3.083; 95%CI: 1.549-6.135) were also significantly associated with data quality. Conclusion: The level of good data quality in the public health facilities was less than the expected national level. Training should be provided to increase the knowledge and skills of the health workers.


Background
The health information system (HIS) is one of the six building blocks of a health system designed for the generation and use of information for other functions of the health system (1). The purpose of a health information system is to routinely generate quality health data that provides speci c evidence support to make decisions on health issues (2). In the "One plan, one budget, and one report" policy of Ethiopia, HIS is the core information system (3). The information revolution is one of the four big agendas of Ethiopia's Health sector transformation plan II (HSTP-II) and it is the phenomenal advancement in the methods and practice of collecting, analyzing, presenting, and disseminating information. Data quality, de ned as data's tness to serve its purpose in a given context in terms of accuracy, completeness, and timeliness (4),-is an essential element of this information revolution agenda (5).
Routine health care data have no importance unless it is accurate, processed, and used to inform decisions hence responsive to the local situations (6).
Improved health system performance is directly linked with the quality and use of routine data in a country's HIS (5,7).
Despite the improvements in the knowledge and understanding of the role of health information in the global health system, the quality of data generated by routine HIS is still very poor in low and middle-income countries (8). The quality of data was found to be between 34-72% in many African countries (9). The large volume and variety of data generated in public health facilities are overlooked due to their limited qualities (10)(11)(12)(13). In Ethiopia, data quality is below the 80% national expectation (14) and data completeness, accuracy, and timeliness were found to be between 33%-78% in different areas (4,5,(14)(15)(16)(17)(18).
All functions of the health system and public health policy are seriously reliant on the presence and use of quality HIS data (3,19). However, lack of quality data and poor usage are affecting the health system's performance and the health of the society. This is evident by frequent over and under stocks of supplies, poor detection and management of outbreaks, and scarcity of human resources at different times (20).
Studies identi ed that data quality is associated with various technical, behavioral, and organizational determinants such as personal knowledge (21), negligence and data manipulation for competition sake (22), motivation (23), user-friendliness of reporting format, standardized indicators (24), training (25,26), feedback (14), supervision (27), sense of responsibility (28), and data use (29,30). Although the studies conducted on the data quality, no study has been conducted at the department level in this study area to explore the factors affecting data quality. Moreover, the few studies conducted did not quantify the magnitude of the associations. Therefore, this study was aimed to assess the magnitude of the quality of routine health information system data and its determinants among public health facilities.

Study area and study period
The study was conducted in public health facilities of Harari regional State of Ethiopia from July 1 to 15, 2020. Located 518 km to the East Addis Ababa, Harari Region is one of the ten regional States in Ethiopia with an estimated area of 311.25 km 2 . Based on the 2007 national census conducted by the Central Statistical Agency of Ethiopia (CSA), Harari Region has a total population of 183,415, and has 9 Districts (6 urban and 3 rural) and 36 kebeles (the smallest administrative units in Ethiopia) (31). There were seven hospitals in the Harari Region of which one was owned by the Harari Regional Health Bureau while the rest was owned by other governmental and private organizations. Among these, the 2 hospitals were governmental public health facilities. There were also 8 public health centers, 32 health posts, 10 not-for-pro t private clinics, and 15 private clinics for pro t in the Harari Region.

Study population
The study populations for this study were all departments that were implementing routine health management information systems (HMIS) in all public health facilities of Harari Regional State.

Sample size determination and sampling procedure
The sample size of the study was determined by using a single population proportion formula Where; n = Sample size, Zα/2 = Standard normal distribution corresponding to a signi cance level of alpha (α) of 0.05 = 1.96, P =magnitude of the data quality of routine health information system among departments in public health facilities of Dire Dawa (75.3%) (14) and d = degree of precision = 0.05.

Accordingly
Since the 245 total number of departments was less than 10,000, the correction formula was used and gave n f = 314/1+ (314/245) =138. However, since the existing departments implementing health information systems were found to be manageable, a census of all (245) departments found in all 42 public health facilities (8 health centers, 32 health posts, and 2 hospitals) was considered.

Data collection instrument
The questionnaire was adapted from the Performance of Routine Information System Management (PRISM) assessment tool version 3.1. (32), and used with little modi cations to collect quantitative data. It comprised four sections: The rst section was composed of questions related to socio-demographic characteristics of the department heads such as age, educational status, working experiences, professional category, salary, residence, and others. The second and third sections of the questionnaire included items assessing the technical, organizational, and behavioral factors associated with the quality of routine health information system data respectively. Observations, interviews, and document reviews guided by an observation checklist (fourth section of the questionnaire) were used to collect data on the departments' data quality from all the departments through their respective department heads/representative of each department.

Data collection procedures
Twelve health professionals who had basic data management training and prior experience of data collection and four health professionals who were members of the HIS monitoring team were assigned for the data collection and supervision respectively. Before the data collection, two days training was provided on the purpose, how to collect data, and on ethical issues emphasizing the importance of the safety of the participants, and data quality.
The data were collected by going to all the health facilities, explaining the aim of the study, ensuring the con dentiality of the data, obtaining the written consent from each facility head and participants, observing and interviewing to ll the checklist, and distributing the questionnaire to the department heads to read and ll the rest.

Study variables
Dependent variable Data quality was the dependent variable of the study.

Independent variables
The independent variables include: Organizational variables:training, feedback, supervision, computer, internet, reward, engagement in HIS activities, performance review meeting, and data use, Technical variables:-presence of standard indicators, report formats, and trained person able to ll format, and Behavioral variables:motivation, attitude, data manipulation for competition, negligence, sense of responsibility, knowledge, and data quality checking skills.
Poor quality data: The data that does not t the three criteria (accuracy <80%, or completeness <85%, or timeliness <85%).
Completeness: refers to when the expected data elements are lled in the report format and on the source documents. The data completeness is the average of the source document or registration content completeness and reports content completeness. The data is complete if the average is >=85% (33).
Register content completeness: was checked by taking the last 15 cases from the registration of the department for the selected month/quarter and measured by dividing the number of completely recorded cases by the total cases checked. If the total cases/entries registered in the register are less than 15, the available cases are considered.
Report contentcompleteness: at the department level, report content completeness was measured by dividing the number of data elements reported in the report format by the total number of expected data elements to be reported by the department (32). For departments that do not keep the report copy with themselves, it was taken from the HMIS unit.
Data Accuracy: was measured by recounting already reported data elements/indicators from the source document/register and compared with the one reported in the report format. The data elements/indicators for which the veri cation factor (recounted value from the source document divided by the value reported in the HMIS report) fell between 0.9-1.1 were regarded as accurate (have normal veri cation factor). The department's data accuracy was determined as the sum of accurate data elements/indicators divided by the total number of data elements checked. The department data is accurate if the average is >=80% (27).
Timeliness: was assessed as a report submission within the accepted time period through observing the reporting date on the reporting form of two randomly selected monthly reports. Departments at the health posts were expected to report from 20-22 nd , departments at the health centers and hospitals report to the next level from 20-24 th . The data of the department is timely if the average is >=85% (33).
Knowledge on HIS: It was the knowledge of rationale of routine HIS data that was measured by using the three knowledge-related open-ended questions which have a total raw score of 7 and for which the answers were coded according to the themes on the PRISM assessment user guide (32). The 50% mean score was used to classify the knowledge as good or poor.

Data quality control
The pre-test of the questionnaire was done on 12 departments which are found in health facilities outside of the Harari Region to identify any ambiguity, consistency, and acceptability of the questionnaire as well as the time needed to ll the questionnaires. The necessary modi cations were made before the actual data collection.
The quality of data was monitored frequently both in the eld and during data entry. This was done in the eld through close supervision of the data collectors. All completed questionnaires were examined for completeness and consistency during data collection. An incomplete and unclear lled questionnaire was given back to the study participants immediately.

Data processing and analysis
Data were entered using Epi Data and exported to SPSS software version 25 for data recording, cleaning, and statistical analysis. Descriptive statics using frequencies, percentages, tables, and gures were used to describe the departments in the public health facilities, and the overall data quality was categorized as poor and good data quality. Bivariate logistic regression analysis was done to identify variables that were candidates for multivariate analysis. All variables that have an association on bivariate analysis at a liberal P-value of < 0.25 were considered for inclusion in the multivariate analysis. Afterwards, multivariate analysis was done to control the confounding effect of other variables and to identify independent predictors of routine health data quality in the health facilities. The magnitude and direction of the relationship between the variables were expressed as odds ratios (OR) with 95%CI and P-value < 0.05 was used to declare the statistical signi cance. Model tness was checked by using Hosmer-Lemeshow's test at P-value of >0.05 and a multicollinearity check was also carried out.

Result
Description of the departments From the total of 245 departments found in the 42 public health facilities of Harari Regional state, 222 departments participated in the study with a 91% response rate. Among the 222 departments, 103 (46.39%), 82 (36.94%), and 37 (16.67%) were from the health posts, health centers, and hospitals respectively.

Data quality in-terms of completeness
Of the 17589 data elements checked for report content completeness for the departments, 16415 (93%) of the data elements were completely lled in the report format. Among the 5230 cases checked for registration content completeness with the relevant information, more than two third (69.6%) of the cases were completely registered on the registration while 1589 (30.4%) were incompletely registered. From 222 departments, 89 (40%) of the departments have incomplete data whereas 133 (60%) have complete data (Fig. 2).

Data quality in-terms of timeliness
The departments found in the health posts were expected to submit their report from 20-22th for each month while the departments at the health centers and hospitals were expected to report from 20-24th. Of the total 222 departments whose data was checked for timeliness, majority (93.7%) submitted their report on time while 14 (6.3%) did not. Ninety four (91.26%), eighty two (100%), thirty two (86.48%) of the departments that were found in the health posts, health centers and hospitals respectively submitted their report according to their respective schedule (Fig. 3).
Factors associated with quality of routine health information system data  Factors associated to the quality of routine health information system data on logistic regression in public health facilities of Harari Region, Ethiopia, 2020 (N = 222 COR-Crude Odds Ratio, AOR-Adjusted Odds Ratio, 1 R -Reference category * P-value < 0.05 from multivariate analysis.

Discussion
The accuracy of data in this study was found to be 129 (58.1%) and it was less than the accuracy of data reported from Hadiya zone, Southern region of Ethiopia where seventy six percent of the departments at the health center had accurate data (16) and 79% in Nigeria (34). The difference might be because of the difference in the type of facilities and level of the feedback provided to the departments in which 95.8% of the departments at Hadiya zone (16) and 61.7% of the departments at Harari region received the feedback. Also, the interval of veri cation factor used to measure the data accuracy in Nigeria was wider (0.85-1.15) (34) than the veri cation factor interval used in this study (0.9-1.1) to measure the data accuracy. Generally, data accuracy can be affected by errors that occur during data entry, intentionally manipulating the data for different reasons like competition among staffs and facilities, false report to increase achievement, and reports not made on time.
In this study, the 69.6% registration (source document) content completeness was lower than the 93% report content completeness. This is supported by the recently published study which was conducted in East Wollega where the 78.2% registration content completeness was less than the 86% report content completeness indicating that the health workers focus on managing patients rather than recording data due to the work load and lack of commitment to the data (35).
The 93.7 percent timeliness of the data revealed in this study was closer to the one reported in the data quality review conducted by the Ethiopian public health institute which was 100% data timeliness in Harari Region (17) but higher than the timeliness reported from the other parts of Ethiopia-70% in East Wollega and 89% in West Wollega (36). The easy accessibility of the health facilities in the Harari Region is the possible explanation for the difference observed.
The result of the study revealed that near half (51.35%) of the departments implementing routine health information system have good levels of data quality.
This is similar with the ndings from many developing countries that the data quality falls between 34-72% (9). However, it is lower than the one from the study conducted in Dire Dawa which reported three fourth (75.3%) level of good quality data (14). This might be because of the difference in the way the dimensions of the data quality were measured in Dire Dawa in which the completeness was measured in-terms of the report completeness while in this study the completeness was measured in-terms of both the registration content completeness and report content completeness. It might also be attributed to the effect of Corona Virus Disease (COVID-19) on the health information system performance including data quality because this study was conducted while the COVID-19 is challenging the health system as in general.
The departments that were found in the health centers were 2.5 times more likely to have good quality data than the departments found in the health posts. This is evident by the ndings from the pioneering regions of Ethiopia in which the data quality was better at the health centers and hospitals than at the health posts (37). The low level of education among the staffs at the health posts (all are diploma holder and below), the larger amount of data collected by limited number of health extension workers and lack of HMIS personnel who closely monitor the data quality as compared to the health centers and hospitals are the possible reasons for the variation. It might also be due to the more attention given by the government and other stake holders for the health centers through HMIS capacity building and mentorship.  (14). Training can make clarity on the issues of HIS related activities and tools and increases familiarity with the HIS tools such as registers, reporting formats and information communication technology soft wares.
Although supportive supervision showed association to the data quality on bivariate logistic regression, it was not signi cantly associated to the data quality on multivariate logistic regression in this study. This was different from the nding of the study conducted in Gurage Zone in which the supervision was associated to the community health information system performance (data quality and use) (27). The difference might be attributed to the quality of supervision as noted from Tanzania (21). The other possible justi cation is that in most practical cases, supervision is just to nd fault rather than being supportive supervision. But, it is the supportive supervision which helps the departments to ll their gap in data recording, processing, analyzing, reporting and data quality checking.
The limitations of this study were that it was unable to show the consistency between the data in the routine health information system and that same data in the real-world since the study addressed only the three dimensions of data quality. Future studies should incorporate qualitative studies to have a deeper insight on the behavioral factors that in uence data quality.

Conclusion
The level of good data quality among the departments in the public health facilities of Harari region was less than the 80% expected national level. The refreshment training given to the staff was very low. The type of facility, lack of trained personnel able to ll the formats, and the feedback were the factors that signi cantly associated with the data quality on both bivariate and multivariate logistic analysis and affect the data quality. Continuous refreshment inservice HMIS related training should be arranged and provided by Harari Regional health bureau and other stakeholders to increase the knowledge and skills of the health workers. It is also better for the supervisors at different levels of the Harari region particularly woreda health o ces to provide supportive supervision focusing on the data quality and provide feedback to the departments regularly. was written from the Harari Regional Health Bureau. Informed, voluntary, written and signed consent was obtained from each health facility's managers and study participants to start data collection. The collected data were kept con dential without the names of the study participants.

Consent for publication:
Not applicable.

Availability of data and materials
The data sets used and/ or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests. Level of data accuracy among departments found in different public health facility types of Harari Region, Ethiopia, 2020 (N=222) Figure 2 Level of completeness of data among departments in different types of public health facilities in Harari Region, Ethiopia, 2020 (N=222).

Figure 3
Level of data timeliness among departments in public health facilities of Harari Region, Ethiopia, 2020 (N=222).

Figure 4
The level of data quality among departments found in different facility types of public health facilities in Harari Region, Ethiopia, 2020 (N=222).