Lesionia: a digital data management system for epidemiological and clinical data collected from patients suspected for cutaneous leishmaniasis

Background. Digital systems for data management (DSDM) are considered nowadays of high importance in the field of biomedical sciences. Such systems ensure that data meet the standards of FAIR (Findability, Accessibility, Interoperability and Reusability). Our group is interested in implementing a DSDM for data collected from patients suspected of having cutaneous leishmaniasis (CL) in the frame of diagnostics evaluation. The data is collected in multiple sites and countries by different partners in the frame of a project supported by the USAID-NAS PEER program. We capitalized on the thorough clinical and field expertise of some partners to assess needs. Then, we further refined these needs consortium-wide to define the data to be collected by the clinicians and biologists during the data life cycle. This led to the development of a questionnaire form for data collection and the implementation of a web-based application, called Lesionia. Results. Based on the questionnaire, we developed Lesionia, a digital system for the management and the analysis of clinical and epidemiological data. It consists of a relational database and a web-based user interface (WUI). The database was conceived to be expandable to new collaborators and projects. It allows for data handling from the consented patient interview and sample collection to the samples storage and investigation. The WUI permits data entry, fetching, visualization and analysis. Rigorous controls on data entry were implemented to reduce discrepancies. It also offers a set of analysis tools that range from descriptive statistics to variable correlation analysis. Lesionia is accessible in a secure manner to all users of the consortium through a web browser connected to the Internet. Conclusion. Lesionia is a valuable tool for clinical and epidemiological data management. It is an open source software that can broadly serve the scientific community interested in studying, controlling, reporting and diagnosing CL and similar cutaneous diseases. the of a CL consortia The project species-specific of care diagnostics for These molecular tools will be tested validated on project central Experimental Lesionia consists of a database and a web-based user (WUI). It offers a digital platform for clinical and epidemiological data management for


Background
Leishmaniasis is a group of largely distributed vector-borne parasitic diseases, endemic in more than 98 countries worldwide. Cutaneous leishmaniasis (CL) is considered by the World Health Organization (WHO) as one of the most neglected tropical diseases (NTD). It is a group of cutaneous diseases with high morbidity rates (https://www.who.int/leishmaniasis/en/ ). CL are associated with illiteracy, gender discrimination, weakness of the immune system and lack of resources amongst other factors and they inflict a major social burden (Bennis et al., 2018). Nonetheless, there is an underestimation of CL cases in endemic regions, due to diagnosis and reporting issues (Bailey et al., 2017), including countries of the MENA region and sub-saharan Africa (Alam et al., 2016;Tabbabi, 2019). Available data on CL global occurrence are sparse (Pigott et al., 2014), which presents an impediment to evaluating the disease burden and to the implementation of control strategies (Bailey et al., 2017). More broadly, epidemiological studies of CL often lack robust and high quality data, as for most of NTDs (Ali et al., 2006, Jajosky et al., 2004. Implementing digital systems for data management (DSDM) appeared as part of the solution for a better quality of data in the case of Malaria and Dengue fever (Einsen et al., 2011; Thomsen et al., 2016), as well as studying the spread of Schistosoma japonicum in China (Gray et al., 2009). Introducing DSDM was assessed on different types of studies ranging from geo-spatial mapping of dengue fever spread in Sri-Lanka (Lwin et al., 2019) to disease outbreaks surveillance in Germany (Krause et al., 2007). The positive impact of such systems on the management of chronic diseases was also assessed (Roshanov et al., 2011;Michael et al., 2013). For the specific context of epidemiological studies of parasitic diseases, Gray et al (2009) presented a DSDM designed to process data collected during field-based surveillance of the transmission of bovine Schistosoma japonicum in China (Gray et al., 2009). The DSDM consists in a database with a multi-user interface developed using Microsoft Access, VBA and SQL. The authors demonstrated the relevance of digitizing the data collection and management processes in leveraging the data quality and its impact on the study reliability (Gray et al., 2009). Implementing similar DSDM for the collection and the management of CL-related data is of high importance in addressing the under reporting of CLcases worldwide and the mapping of the disease spread and burden. The present study is focused on the development of a DSDM specific to CL cases, called Lesionia. It was initially developed for the needs of the PEER518 project consortium, but can easily scale-up to other consortia and cutaneous diseases. The project aims at developing speciesspecific point of care diagnostics for CL. These molecular tools will be tested and validated on clinical samples collected in three countries of the MENA region, namely Tunisia, Morocco and Lebanon. The project consortium includes nine institutions based in five countries: Tunisia, Morocco, Lebanon, Mali and the USA. The central node of the consortium is the laboratory of Molecular Epidemiology and Experimental Pathology Applied to Infectious Diseases (MEEPlab) at Institut Pasteur de Tunis (IPT). Lesionia consists of a database and a web-based user interface (WUI). It offers a digital platform for clinical and epidemiological data management for CL cases across multiple institutions and countries, along with a tool for experimental data handling and analysis.

Data collection
In the frame of the PEER518 project, patient recruitment is taking place in five sites located in four countries (Supplementary table S1). Patients are informed and their written consent is obtained. Clinical samples from patients suspected for cutaneous leishmaniasis are collected, anonymously coded then transferred to the MEEP-lab at IPT, along with related data. At the MEEP-lab, species identification is performed or confirmed using different molecular techniques, the results are collected and the samples are stored. Classically, clinicians and practitioners use a simple form that differs from one hospital to another. Prior to implementing the database and in order to harmonize the data collection process within the consortium and enhance data quality, a discussion took place between clinicians, biologists and bioinformaticians. It aimed at defining the data to be collected from patients and respective samples and how it was to be used. This led to the definition of a questionnaire form ( Figure 1). It included patient demographic data, residency and travel history, environmental data related to the notion of insect bite and the animals encountered in close proximity, clinical data including previous treatments if any, data on the clinical aspect of the lesion(s) suspected to be CL (CL could have different presentations), the sampling method and the diagnosis results. The latter include direct examination performed at the sampling site and a selection of molecular tests (PCR ITS, qPCR, RPA-LF) that are further performed at the MEEP-lab for species identification. The questionnaire was conceived with respect to the best practices of data collection (Burguess, 2001; Kelley et al., 2003). First, the questions' order and phrasing were established with practitioners to ensure they can fluently go through them during interviews. We verified that interrogators shall understand the questions consistently. Response fields were reduced to checkboxes with pre-coded responses with a supplementary field for a potential non pre-coded response when it applies. In fact, these formats minimize handwriting, and subsequently spelling and transcription errors during data entry into the database. Units were specified whenever necessary and a canonical format was provided for dates.

System architecture
The DSDM implemented for the need of the consortium partners was called Lesionia. It contains multiple components as shown in figure 2. First, the database was designed and implemented using MySQL (Supplementary figure S1). It included data collected by the clinicians, data relative to biobanking in the MEEP-lab and data on the different users that shall use the system, their affiliations and functions. Users have unique login and passwords, provided by the system administrator for a secure connection. The back-end communicates with the database which is hosted on a LEMP server -a variation of the LAMP software bundle that uses Nginx rather than Apache, also known as LNMP (Linux, Nginx, MySQL, PhP). The front-end is designed to be a web-based user interface (WUI) accessible from any web browser. The WUI communicates with a shiny-server hosted on the LEMP server which submits a data request to the database. All servers and the database are physically hosted at the MEEP-lab (IPT). Both the front-end and back-end were developed using R. A list of all packages and dependencies were listed in the project repository on Github (https://github.com/Harigua/LEISIApp ).

The web user interface (WUI)
We developed the WUI using the Shiny framework implemented in R. It is accessible through www.lesionia.pasteur.tn/ . It has a homepage that contains the login form and information on the PEER518 project, the consortium and how to contact the system administrator and/or the research team (Figure 3 (a)). Using a username and a password previously validated by the system manager, users can access four sections (Figure 3 (b)): (i) Data management, (ii) Data entry, (iii) Data viewer, (iv) Data analysor. The WUI includes a header that contains the application name 'Lesionia', the name of the currently logged in user and a 'logout' button. The 'Lesionia' button allows the user to go back to the homepage from anywhere during the use of the application and thus to switch between sections. The "Data Management" section has a restrictive access to the super user(s). It permits to add new user(s), edit existing user(s) or delete user(s). It also permits deleting or downloading data using filters. "Data entry" is the section that includes all the digital forms and interfaces to enter new data or update existing data. A double-entry system was implemented to minimize the rate of discrepancy. The first entry consists in creating a unique ID for a patient. To continue entering the data, the user has to re-enter the unique ID, and an automatic check of the concordance of IDs is performed. If the second entry is not correct, the user will know and correct it accordingly. If the first entry is not correct, the user should report this discrepancy using a dedicated field. Based on the discrepancy report, the system manager could intervene. For each section in the questionnaire, a digital form exists in Lesionia. These forms present multiple fields, out of which those that are mandatory are tagged with an asterisk. Missing data fields can be left blank. Default values are beforehand assigned to indicate that these fields correspond to missing data. These default values are "N/A" or "-1" for data of type character or integer, respectively. Missing data with type date are set to 01-01-1900. This is important for the forthcoming data analysis steps and for the assessment of the data collection procedures. All data in the questionnaire form can be entered in the corresponding tabs of Lesionia interface (Figure 3 (c)). "Data viewer" allows the user to visualize, search and browse all the data within the tables of the database. It also provides statistics on patients recruitment flow, partitions according to nationality, gender and age classes and raw data visualization and download. "Data analysor" contains sections dedicated to data analysis, including: (i) General statistics, (ii) Correlation between the SPECIES variable and any other chosen variables, (iii) Regression (Linear model, Chi square) and (iv) Multiple Correspondence Analysis (MCA). Almost all the analyzes are dynamic and can be customized by choosing the variables to be considered. All resulting graphs and figures can be directly downloaded. In order to assess the reliability of the "Data analysor" functionality, we simulated data for 3000 patients. For some data fields, specific distributions were applied according to preliminary statistics provided by our clinician partners (Supplementary Table S2). Otherwise, uniform distributions were applied. The sample size had no effect on the speed of the application. The statistical analyses obtained with the "Data analysor" were consistent with the sample distributions, thus confirming that all statistical tests were correctly implemented (Figure 4). Lesionia was implemented as an open software accessible through the following link (https://github.com/Harigua/LEISIApp/ ). To make this software further accessible to the scientific community willing to use it as is or customize it, we developed a user guide on how to install it on different operating systems (Linux, Mac and windows). It can either be installed locally as a desktop application or as a web-based application (Supplementary file S1). It presents the advantage of: (i) error checking during data entry, (ii) ability to report discrepancies to the system administrator, (iii) a built-in statistical analysis tool and (iv) data fields are easily expandable as for the case of biological tests, studied species, etc. On the other hand, it presents some disadvantages, namely, (i) it is closely related to our questionnaire and further development is needed to use it for other diseases and (ii) no logs of data edits are saved for the actual version, but this can be included.

Discussion
The positive impact of implementing digital systems for data management (DSDM) has been now thoroughly demonstrated in different fields of biomedical sciences (Roshanov et al., 2011;Xue-Jaun et al., 2018;Ngwatu 2018). DSDM provide sustainable systems for data management, reliable tools for data collection and efficient platforms for descriptive and predictive analytics (Lwin et al., 2019;Bell et al., 2018). As research data are often specific to their collection context and research objectives, there are limited possibilities for tools repurposing. Thus, implementing dedicated systems is often a requisite. In this context, a dedicated DSDM was implemented in a chinese community with the objective of embedding epidemiological data collection on Schistosoma japonicum spread during onsite intervention processes (Gray et al., 2009). This system enabled researchers to produce good-quality data while maximizing the rate of digitizing data collection and reducing the costs of data cleaning. It assures quality of the data through a double-entry check and quality control metrics. It included a built-in statistical analysis functionality and an automated code generator. On the other hand, the system presents no interoperability and is restricted to Microsoft Windows operating system. In this specific context, we herein present Lesionia as an open source pilot software for the collection, management and analysis of data relative to patients with cutaneous diseases such as CL. Lesionia has the advantage of allowing multi-users data entry even for a single patient/sample. This is of high importance in smoothing the multi-actors process of data collection from patient recruitment to sample storage and molecular tests performance. This advantage is valuable for the harmonization of the multicentric process of the experimental detection and identification of the CL causing species. Lesionia is also an ethically-sound application that uses anonymous coded data. Through the double-entry check for ID codes, it guarantees a reliable tracking system of data entries. These IDs also connect the digital entries to the paper forms that are kept by the clinicians. These paper support include patients' personal data and clinical records that are confidential and are not shared through the consortium. Thus, they are not included in the database. These confidential data can only be used by the clinicians in case there is a need to contact patients during outbreaks or other sanitary issues. Lesionia also offers a built-in functionality for real-time data analysis. This is one of the most practical features as it permits all users to run sophisticated and various statistical analysis in a user-friendly and coding-free fashion. Thus the users do not need to be experts in scripting or coding to generate the analyses. Another group of researchers recently published Epihak, a prototype of a DSDM developed through a health hackathon in Sri Lanka (Lwin et al., 2019). It has been implemented to study and combat dengue fever using an integrative framework that includes mobile technology. Epihak permitted to digitize hospital forms and enhance track surveillance of dengue fever across Sri Lanka. It presented the advantage of using a mobile application which increases the rate of field-based data collection. The authors demonstrated the impact of mobile technology use on the speed and accuracy of data collection, management and analysis as compared to classical methods that include paper forms and outdated digital applications. Although, Lesionia can be accessed through a web browser from a mobile device (phone/pad), the user experience is not optimal. We intend to deploy it in a next step as a mobile application as these technologies facilitate field-based data collection processes. They also permit the access in a very userfriendly fashion to build-in functionalities of the device, such as the camera, the localization and the user identification (Fraser et

Conclusion and future directions
Lesionia is an innovative application for clinical and epidemiological data management of CL patients. It offers a multi-site and multi-user web-based application which enhances the efficacy of the data sharing process. It was implemented using interoperable languages which makes it usable on different operating systems. As part of implementing reliable and sustainable data management systems in our lab and institute, Lesionia was also used as part of an institutional clinical investigation project at IPT. It can scale up to a more widely used system in managing and reporting cutaneous diseases in health care centers across the country and the region. Also, it can be reused for other multicentric trials or projects.

Consent for publication
Not applicable