Building a Centre of Excellence in Data Management in East Africa

Background: In resource-limited settings, there is a paucity of high quality data management systems for clinical research. The result is that data are often managed in high-income countries disadvantaging researchers at sites where the data are collected. An institutional data management system to address the data collection concerns of the collaborators and sponsors is a key institutional capacity element for high quality research. Our goal was to build a local data management center to streamline data collection and validation compliant with international regulatory bodies. Methods: Leveraging established collaborations between Office of Cyber Infrastructure and Computational Biology of the National Institutes of Health and the John Hopkins University School of Medicine in the United States, the Infectious Diseases Institute at Makerere University built a data management coordinating center. This included mentorship from the NIAID International Centers for Excellence in Research and training of key personnel in South Africa at a functioning data center. The number of studies, case report forms processed and the number of publications emanating from studies using the data management unit since its inception were tab-ulated. Results: The Infectious Diseases Institute data management core began processing data in 2009 with 3 personnel, hardware (network-enabled scanners, desktops, server held in Bethesda with nightly back up) and software licenses, in addition to on-site support from the NIH. In the last 10 years, 850,869 pages of data have been processed from 60 studies in Uganda, across sub-Saharan Africa, Asia and South America. Real-time data cleaning and data analysis occur routinely and enhance clinical research quality; a total of 212 publications from IDI investigators have been published over Apart from the back-up services provided by the the center is now self-sustaining from fees charged to individual studies. Conclusion: Collaborative partnership among research institutions enabled the IDI to build a core data management and coordination center to support clinical studies, build institutional research capacity, and to advance data quality and integrity for the investigators and sponsors.

erating increasing volume and complexity of data as the number of research studies in sub-Saharan Africa (SSA) continues to grow [1]. Data management is a critical aspect of high quality research. The researchers, software analysts and developers, and biostatisticians need to collaborate using a data management system that addresses each stakeholder's needs including accurate data in real time, easily exportable data to create aggregate regular data reports, simple data integration, data dictionaries and edit check validations [1]. Good data management is not a goal in itself, but rather a key component to high quality research [5].
Importantly, data is often managed by the developed country partners and held outside the country where the data is generated [2]. This data ownership arrangement necessarily disadvantages local, indigenous researchers in the analysis and publication of data. The development of new user friendly data management systems and technology has allowed many countries to manage their own data within country. With the help of research capacity building programs, this has enabled indigenous researchers to own and analyze the data which, in turn, has led to sustainable capacity development. When data management is handled in the country where the data is 4 | P a g e collected, local investigators develop expertise through forced interactions with data managers.
Greater involvement with databases also increases insight into efficient and pragmatic data collection and, subsequently, improved protocol development skills.
The ultimate goal is to provide high-quality reliable data that is collected in accordance with the protocol and that requires less post-hoc cleaning. In the past, in-country data management was often handled using double data entry as the quality control process. A large amount of tedious direct data entry was often relegated to relatively poorly paid personnel without advanced degrees. Even with meticulous quality control procedures, there was a high likelihood of errors that had to be adjudicated. Study staff were often forced to do data entry due to the consistent backlog instead of concentrating on clinical work. With increasing data volumes, query resolution and auditing data entry and quality became more challenging; auditing data trails is very important because it monitors modifiable factors such as human error, and allows compliance with regulatory bodies. Manual entry led to long delays between patient assessments at the clinical sites and the availability of that information in a clean study database. In turn, accurate progress reports to manage implementation bottlenecks and corrective action were also delayed [6]. Real-time recognition of clinical trials attaining statistical stopping rules was a source of participant and institutional risk. Finally, clean dataset delays can result in repetitive case report form (CRF) errors that become more difficult to correct with time. These delays may also delay publication of study results [9]. We describe herein the adoption of an institutional data management system that addressed many of the concerns facing African research centers of excellence. [3]

Key features of a data management system appropriate for resource-limited settings
We established a data management unit that maintains high quality data and is able to meet stakeholder needs in a timely manner within a supportive institutional infrastructure at the IDI as a model for in-country data management in resource-limited settings. Attention was paid to cost since studies would pay for the service to ensure sustainability. This was accomplished by building data management skills in-house. With the growing number of multi-site remote clinical trials at the IDI, key features of an ideal data management system were:  Simple technology and reliable infrastructure for data capture with option for paper CRFs and electronic transmission at convenience.
 Simple technology that does not require advanced hardware, minimal need for continuous electricity supply, and software for data transmission compatible with infrastructure at study sites.
 User-friendly data management system with semi-automated data entry.
Efficient study start-up through case report templates, database piloting, and standardized data operating procedures.
 Ability to create edit checks to increase data quality and automated weekly reports interpretable with limited training allowing regular data cleaning particularly for missingness.
 Following a needs assessment that helped to characterize scientists' research methods and data management practices, gaps and barriers were highlighted [6]. A decentralized data management system, ©DataFax/DFdiscover, was chosen where each site transmitted data through a fax machine and the data management center reviewed, verified, and validated data. ©DataFax/DFdiscover addressed many of IDI's logistical challenges commonly encountered in the management of large multi-center clinical trials and in resource-limited 6 | P a g e settings (e.g., intermittent electricity). The system also supports Intelligent Character Recognition (ICR) and computerized receipt, logging and filing of data forms. After several layers of record review occur. After data collection is completed in a given study, database lock is performed following in-depth data quality checking [4]. Locking or closing a database is a crucial practice that is necessary in order to prevent unauthorized or unintential changes after the final data entry, cleaning and analysis is complete.

Training of local teams to handle data
In July, 2009, a data manager and a data reviewer under the mentorship of a clinical trials manager from the NIAID International Centers for Excellence in Research (ICER) of the National Institutes of Health (NIH) were sent to South Africa for training in an existing data management unit using the same system (©DataFax/DFdiscover). The ultimate goal was to build a local data management center cognizant of the local research needs and designed to be integrating into the research context and the local health systems [9]. The first two studies using the data management unit started were an autopsy study describing the  Some of the advantages include: reduced data processing time through timely collection of CRFs, improved quality control by use of quality control notes to flag and track CRF problems, improved two-way communication link between the study data management center and the participating clinical sites, CRF imaging, storage and retrieval system, simple technology at the clinical sites, study management tools and reports, controlled access to study data & CRFs, automated work flow management and increased efficiency through intelligent character recognition, thereby reducing personnel costs.

| P a g e
The cost of the IT infrastructure at the server end was estimated at USD $102,650 at the beginning of the project in 2008. However, through collaborations with OCICB at the NIH the cost was reduced by using existing servers outside the country in Bethesda, Maryland. NIH also provided nightly back-up service which assured both data security and storage. Figure 1 shows the data flow and data management architecture at the center.
On the management center side, computers with very rapid processing speeds and large RAM for quick processing of data images (processor Intel dual core or higher, 1.0 GHz or higher, 1GB RAM or higher) are required. There are no restrictions to laptops or desktops, although Thin Clients have also proved to be very good option due to reduced power consumption.
High-speed and reliable internet connections at the processing center are critical and mandatory.
The key personnel needed at the management center include, Team Lead, who directly supervises personnel at the center, an accountability officer who also has data management responsibilities, Data Managers who are cross-trained as programmers, Data Reviewers and Medical Coders. An IT technician was also needed especially with the set-up of equipment at the study sites. The human resource structure must address process flow since poor management will affect sites and communication with sites. Because there is direct contact between data personnel and Principal Investigators (PIs), this translates to better quality protocols and improved quality of data.    Figure 3 shows the numbers of local PIs versus the number of International PIs whose data has been managed by the data management Centre. Figure 4 shows the worldwide distribution of study sites managed by the data management Centre. Figure 5 shows the number of CRF pages submitted over the past 10 years. A total of 850,869 records were in the database as of December 2019.
The data management center in close collaboration with the research department and statistical department produces detailed validation reports, descriptive statistics, and plots for the investigators [10]. Access to the data and real-time data cleaning and database completion has led to improved publication productivity as shown in Figure 6.

CHALLENGES AND KEY LESSONS
All the study sites are trained on the data collection process focusing mainly on CRF completion. A key challenge to paper-based CRFs is that any change to a form creates difficulty in ICR data capture, review and analysis [3]. Although the site teams are encouraged to limit changes to the final versions of the CRFs, changes are still made and have resulted in database errors. Other data handling challenges included: sustainability due to high staff turnover due to repetitive task and high volume, shortage of well-trained biomedical research methodologists with data management skills in sub-Saharan Africa [7], delays in sending out and receiving quality control reports as volume of studies increases, vendor supplied software with yearly licensing costs, and data security and back-up of data to protect against catastrophic loss.
13 | P a g e

RECOMMENDATIONS
A key component of research capacity building is investment in the systems that underpin successful and high quality research. We were able to implement and grow a highly successful data management center appropriate for our low-income country setting that was compliant with regulatory standards, scalable, and high quality meeting international regulatory requirements for randomized clinical trials. The current lack of investment in data management in SSA can be attributed to a general lack of awareness of the fundamental role of data management and biostatistics in the conduct of research [6]. Holding the data at the sites where the research is conducted can empower local researchers and provide preliminary data for new proposals, and most importantly build capacity for scientific leadership. Training data managers and biostatisticians in SSA should be a capacity building priority [7].

DECLARATIONS Ethical Approval and consent to participate
Not Applicable.

Adherence to national and international regulations
Not applicable.