This study offers a systematic assessment of widely accepted indicators of data quality and efficiency within the frame of digital engagement in a conflict-affected setting. Well-maintained original and final databases, and rigorous record-keeping around costs incurred and hours logged were key to the success in comparing quality, efficiency and feasibility of EDC versus PDC in this survey setting. This study suggests that even in fragile and conflict-affected settings, EDC can lead to improved quality and efficiency of data collection and management over PDC, although initial programming investments EDC are costly. Even without upfront programming costs, EDC was 2,584 USD more expensive than PDC, although these could likely be mitigated due to efficiencies gained from increased experience with the data collection software and by consolidating trainings. For repeat surveys in secure settings, electronic programming is preferable as it brings the data management unit more efficiently to the desired end state of electronic storage, even if it is slightly more expensive.
In our study EDC led to an average time savings of 40 minutes per household compared to paper, and when comparing data management and cleaning time alone, EDC saved 87 person-days overall (14.5 compared to 101.5). A comparison of EDC and PDC in households in Burkina Faso showed similar results for saved time in households [13]. Both pre-cleaned and final electronic collected datasets were slightly (< 1%) more complete than their paper counterparts, which is comparable to findings from similar comparisons in a Kenyan study with 1% and 0.1% missing data in paper and electronic datasets respectively [14]. The Burkina Faso study found negligible differences in completeness but duplicates only in paper datasets [13]. A comparison of PDC with tablet-based EDC in northwest Ethiopia also found data completeness to be superior in the electronically collected data [15]. The same study showed that enumerators favored EDC to PDC for similar reasons to those of our study: improved efficiency of data collection, improved quality due to automatic skip patterns and data quality checks, faster data transfer and tablets are more convenient to transport.
However, EDC costs exceeded PDC by 68%, but this was almost entirely due to the upfront electronic programming costs of 19,634 USD. These EDC costs far exceeded what other studies have shown. A Kenyan study assessing costs of PDC and EDC found that establishment of a smartphone data collection system was 9.4% more expensive than PDC, yet over the subsequent two years, EDC was 7% less expensive to operate compared to PDC [14]. The same Kenyan study also reported that enumerators appreciated the opportunity to improve their IT skills through EDC [15]. Despite the similarity in costs, in the long run, investment in EDC may be a worthwhile investment in relatively secure areas to save data collection time in the field, and data management time in the office. Over time, trainings may become shorter as trainees become increasingly familiar with digital technology, and for repeat surveys, programming time should remain very low. Time is also a valuable resource.
Interviews on acceptability highlighted a number of relevant concerns around the use of digital data in fragile settings: threats to the person(s) carrying the tablet and respondent hesitations were discussed in equal measure. Similar concerns were raised by enumerators of the Ethiopian study [15], and by researchers in a South African study [16]. These issues raise the need to mitigate security risks for field staff using electronic devices in fragile settings and to pre-sensitize communities about safe storage of electronic data in these settings. Local leaders are often engaged prior to household surveys for a similar purpose, yet sharing in advance the news that field staff will pass through an area with valuable electronic devices may further expose them to risk [16].
The efficiencies and user acceptance described above add evidence to the argument for further expansion of digital technology in survey settings, also those affected by insecurity. But total replacement of digital technology over paper is unlikely to occur quickly according to Adner and Kapoor’s ‘War Between Ecosystems’ paradigm, in which the two systems enjoy a ‘robust coexistence’ dynamic [17]. This particular ecosystem enables both the piloting of digital systems where it is safe to do so, and the use of the ‘old technology’ of paper. Survey planners are well acquainted with the process of developing paper-based questionnaires and electronically entering paper data, which have been extensively documented in detailed methodological open-source tools and guidelines. Current trends in the technological and political environment suggest that Afghanistan will remain in quadrant two, meaning that digital technology is unlikely to replace paper in the coming years. Challenges abound: digital data collection issues like accessing stable internet connectivity, programming forms, and resolving technological errors usually require specialized knowledge and training. In addition, insecurity also undoubtedly slows down progress towards the next ecosystem evolution.
A number of study limitations may affect the results presented here. First of all, one systemic error affected the calculation of a question in the immunization module. An incorrect skip pattern in the paper questionnaire and subsequent incorrectly programmed skip logic in the electronic version meant that about 40% of responses were missed for this question. A similar mistake led to high missingness for a family planning question in the women’s module. Second, time stamps for coding and data entry for paper data were not logged using a precise and consistent approach across all DMU staff, so logged hours could have included non-coding or data entry tasks.
Furthermore, future studies would benefit from including four additional indicators: internal consistency of the final dataset, number of duplicate records, timeliness of data availability, and time for data coding and second-step (verification) entry of electronically collected data (Table 2). Internal consistency was not monitored using a systematic approach; however, we programmed skip logic and field data type constraints into the electronic questionnaire to eliminate user error. The comparison of data completeness and anecdotal evidence suggest that EDC performed better than paper in this regard. For example, six households in paper clusters which did not report having any children under the age of five also contained data on under five illness, while none of the electronic cluster households showed such an inconsistency. 11 respondents from paper clusters responded that a child was born dead, but later responded that the same child is still alive, while no similar irregularities were observed in the electronic cluster data. The same pattern was found in the Ethiopian study, in which invalid entries or errors were more found in nearly half of the paper-based surveys compared to one-third of the electronically collected data [15].
Table 2. Operationalization Table for Evaluation of Digital Data Collection, Management and Analysis.
Limitations
The level of effort required to prepare EDC for coding and verification deserves to be underscored. EDC still required a substantial time investment so that survey modules could be merged for analysis. Due to user error during household data collection, incorrect line numbers were sometimes entered into some modules of the household interviews, resulting in errors during merging, and software-generated record duplication. Although these issues could eventually be resolved in two forty-hour work weeks, it was occasionally necessary to seek information from the field regarding specific households and individuals to merge survey modules correctly. Even in EDC, human error in data entry will lead to logical impossibilities during merging, which different software packages will handle differently [3]. Understanding how one’s preferred software will handle many-to-one or one-to-many merging errors is essential before using it to start managing survey data.