The framework was developed using PHP v7.4 scripting language [24] is composed of five modules, as follows: i) a metadata database and an Admin System; ii) a form converter; iii) an ETL (extract-transform-load) processor; iv) a data quality module; v) and the Ontology Services. Figure 1 shows the REDbox framework overview.
The metadata database and the Admin System
The web-based Admin System was developed in C# [25] and JavaScript [26] programming language to easily manage the mandatory metadata through create, read, update, and delete (CRUD) operations. Figure 2 presents the relational database model.
In general, first an entry to a REDCap project must be created (table redcap_project), including the Application Programing Interface (API) parameters and, then, each project’s instrument must be registered (table redcap_forms). Table form_metadata stores semantic mapping for instrument’s fields. Finally, the following tables are used by the Data Quality Module: redcap_validation_types, redcap_validation_rules, redcap_validation_issues, redcap_visits and redcap_visits_config.
The form converter
Considering that instruments are built using distinct standards in each software, a converter is desired, so the designer does not have to create the same form twice. Forms in REDCap can be automatically created through derivation from ontologies, or the conversion of a form designed in XLSForm standard.
To initiate the process, the user must upload the spreadsheet (.xls) or the ontology (.owl) file, fill the form name, and choose between generating a .zip file, to manually upload it into REDCap, or automatically importing the form through the API. In the second option, the API Token and URL must be provided. Figure 3 shows the user interface of the converter.
Deriving from ontologies. Each property of a given ontology can be converted to fields in forms. The name and type of a field is obtained from the name of the property and the associated type (text is the default type). Minimum and maximum values defined as restrictions on properties are also converted.
Converting from XLSForms. The converter supports all common field types, such as: text, date, date and time, time, integer, decimal, calculation, single selection, multiple selection, files and notes. These types of fields will be converted as they are, including the variable name and values assigned to options in single and multiple selections, so instruments will have matching structure on both systems. Skip logic defined on KoBoToolbox is translated to REDCap branching logic, as well validations rules.
In the designing process, there is a particularity related to multiple selection questions (checkboxes). This type of question needs to have the field’s name starting with ‘checkbox_’. This is needed to ensure a correct identification of a multiple selection question structure during data transfer from KoBoToolbox to REDCap.
Before starting the conversion process, the naming convention will be pre-checked by the converter module. If any inconsistency is detected, the conversion will fail, and the user will be informed with the detected error.
The ETL processor
After converting the instrument and transmitting it to REDCap, KoBoToolbox native REST Services must be enabled in the form settings to instantly submit collected data to the ETL processor through a POST request. The processor URL and basic HTTP authentication credentials must be provided.
The processor receives the data collected in KoBoToolbox as a JSON object, which is parsed to remove unnecessary elements that are not related to the data of interest. After verifying authentication credentials, the metadata is queried to obtain the URL and the token of the REDCap API (table redcap_projects) and to verify if it is the first form in the project (table redcap_forms). If it is, a request is sent to REDCap API to generate a new record ID, which means that it is a new participant in a research project. Otherwise, the record ID will be searched in the log of collected data, based on the participant identifier. Then, a request is sent to the REDCap API to import the data.
After successfully saving the data, additional steps may take place depending on the settings defined for the instrument, such as: sending of e-mail notifications (both for the respondent and the research team), verification of duplicity of records, and the instant lock of the saved record (to avoid changes in the data). These are useful features that may facilitate the management of research data.
Once the data is in the REDCap database, changes in records are monitored through the Data Entry Trigger module, which can detect any changes. When it occurs, the processor exports the edited data from REDCap and logs it into the relational database.
Data Quality Module
Data management is a continuous process and represents a critical phase in clinical research, due to its importance to the generation of high-quality and reliable data for statistical analysis, which should meet the protocol-specified parameters and comply with the research protocol requirements [27].
It is crucial that the management activities occur in parallel with the data collection. The data manager usually carries out a data validation process, which includes the verification of the consistency, completeness and accuracy of collected data. That way, it is expected to avoid missing data and an increase in quality.
Most data are acquired during participant’s visits in a health research. Therefore, keeping track of the schedule of visits and their status (carried out, not carried out, pending) are essential for not missing any milestone.
However, all of these tasks are time consuming, because they demand a careful inspection of a significant amount of data. The REDCap software natively offers useful tools to help data managers and researchers, such as the Resolution Workflow and the Scheduling features, which allows the opening of queries to request the verification of the collected data and assists in the scheduling of expected visits for participants during the study (although it requires a manual setup for each participant), respectively.
The Data Quality Module is composed of two functionalities that can complement the ones offered by REDCap, focusing on the reduction of the workload for data managers and researchers.
First, there is an automatic rule-based validation procedure that goes through each field in all instruments searching for any inconsistency. Rules must be pre-defined as metadata and they represent the format or range of values expected for a given field. The procedure runs several times a day to check, at the same time, for new issues and to verify the resolution of previously identified ones. When an issue is detected, a query is opened in the Resolution Workflow (in REDCap) and the data collector is alerted by e-mail. Figure 4 presents the dashboard with an overview of all issues detected in a REDCap project.
Additionally, a panel was developed to provide a quick visualization of all upcoming participants’ visits. Each row in the panel is a participant and each column a visit. The color of cells represents the status of a visit (green: carried out; red: not carried out; yellow: pending/waiting for the participant). Dates are calculated based on a reference date field (e.g., the day of an intervention or inclusion in the study) and in the days offset for each event. This information is also stored as metadata.
The panel is created in real-time with online data extracted from the REDCap database, saving time of researchers that usually create their own panel using spreadsheets. Figure 5 shows the panel for a study with 21 visits (project IV in table 2).
Ontology Service
The solution offers a service that provides practical tools to enhance the use of ontologies in the system and allow the continuous integration of different data sources, able to adapt to the evolution of ontologies and ensure availability and avoid data loss.
As previously stated, the form converter is able to derive an instrument from an ontology. In a similar way, this service enables the creation of an ontology based on an instrument. This feature relies on an external application, namely the D2R Server [28,29]. The D2R is a tool that converts relational contents in semantic formats, allowing a quick conversion between these formats by automatically creating ontologies based on the schema of the content.
Relying on this feature, REDbox can define an ontology from a data collection instrument. To achieve this, a temporary table is created on a relational database, where each column represents a field in the instrument. Then, the D2R generates and publishes an ontology using the table structure, i.e., converting columns to properties, which can be later customized. Table 1 presents an example of an ontology generated from an instrument containing patient’s treatment data.
Table 1. Instrument and ontology correspondence
Instrument
|
Ontology
|
TB treatment
|
Field
|
Type
|
Property
|
Range
|
Start date
|
textbox with date validation
|
http://vocab.redbox.technology/vocab/treatment/start_date
|
Literal (date)
|
TB clinical form
|
Multiple choice with single answer
|
http://vocab.redbox.technology /vocab/treatment/clinical_form
|
Literal
|
Discharge date
|
textbox with date validation
|
http://vocab.redbox.technology /vocab/treatment/discharg_date
|
Literal (date)
|
The Ontology Service guarantees the semantic interoperability between the applications and formularies that use different versions of the same ontology or even between different ontologies by maintaining the history of changes and mapping the concepts from one ontology version to another. There are a few features that this piece of software contemplates: upload of a file containing the source term of one ontology version and correspondent target one in the new ontology version; upload annotated files with one ontology version and convert them to an older/newer version of the same ontology; or upload a marked-up file with an ontology and convert it to a file of correlated ontology that was previously aligned/mapped.
Validation
The validation of the proposed solution is performed by its use in several cross-institutional research projects related to TB in Brazil, namely: i) Longitudinal Study of the Impact of Social Support on Tuberculosis Indicators - ELISIOS; ii) Validation of the Line Probe Assay's performance as a rapid diagnosis method for drug-resistant tuberculosis in reference centers in Brazil; iii) Validation of Recombinant PPD in the Diagnosis of Tuberculosis Infection; and iv) ProBCG - Use of the Bacillus Calmette–Guérin (BCG) vaccine as prevention of COVID-19 in health professionals. Table 2 shows the characteristics of each project that are currently using the framework.
Table 2. Characteristics of each project that are currently using REDbox
Project
|
No. of instruments
|
No. of fields
|
Expected no. of records
|
I
|
4
|
175
|
2500
|
II
|
10
|
597
|
3800
|
III
|
4
|
132
|
1020
|
IV
|
20
|
472
|
1000
|
Total
|
38
|
1376
|
8320
|
It is possible to note that there is a significant number of instruments and fields on each project. That is to say that the form converter module is crucial in this scenario, where each form needs to be designed only once in KoBoToolbox and, then, converted to the REDCap format. The expected number of records is also significant, which may demand the use of easy- to-use and offline tools