This study offers a systematic assessment of widely accepted indicators of data quality and efficiency within the frame of digital engagement in a conflict-affected setting. Well-maintained original and final databases, and rigorous record-keeping around costs incurred and hours logged were key to the success in comparing quality, efficiency and feasibility of CAPI versus PAPI in this survey setting. This study suggests that even in fragile and conflict-affected settings, data collected through CAPI can show improved quality and efficiency of collection and management compared to PAPI, although initial programming investments CAPI are costly. Even without upfront programming costs, CAPI was 2,584 USD more expensive than PAPI, although these could likely be mitigated due to efficiencies gained from increased experience with the data collection software and by consolidating trainings. Caeyers described a number of detailed scenarios which make CAPI more cost-effective than PAPI elsewhere (Caeyers, Chalmers and Weerdt, 2010). For repeat surveys in secure settings, electronic programming is preferable as it brings the data management unit more efficiently to the desired end state of electronic storage, even if it is slightly more expensive.
In our study an average time savings of 40 minutes per household was observed in CAPI clusters compared to paper, and when comparing data management and cleaning time alone, 87 person-days overall (14.5 compared to 101.5) were saved in CAPI clusters. A comparison of CAPI and PAPI in households in Burkina Faso showed similar results for saved time in households (Di Pasquale, 2018). Both pre-cleaned and final electronic collected datasets were slightly (< 1%) more complete than their paper counterparts, which is comparable to findings from similar comparisons in a Kenyan study with 1% and 0.1% missing data in paper and electronic datasets respectively (Njuguna et al., 2014). The Burkina Faso study found negligible differences in completeness but duplicates only in paper datasets (Di Pasquale, 2018). A comparison of PAPI with tablet-based CAPI in northwest Ethiopia also found data completeness to be superior in the electronically collected data (Zeleke et al., 2019). The same study showed that enumerators favored CAPI to PAPI for similar reasons to those of our study: improved efficiency of data collection, improved quality due to automatic skip patterns and data quality checks, faster data transfer and tablets are more convenient to transport.
However, CAPI costs exceeded PAPI by 68%, but this was almost entirely due to the upfront electronic programming costs of 19,634 USD. These CAPI costs far exceeded what other studies have shown. A Kenyan study assessing costs of PAPI and CAPI found that establishment of a smartphone data collection system was 9.4% more expensive than PAPI, yet over the subsequent two years, CAPI was 7% less expensive to operate compared to PAPI (Njuguna et al., 2014). The same Kenyan study also reported that enumerators appreciated the opportunity to improve their IT skills through CAPI (Zeleke et al., 2019). Despite the similarity in costs, in the long run, investment in CAPI may be a worthwhile investment in relatively secure areas to save data collection time in the field, and data management time in the office. Over time, trainings may become shorter as trainees become increasingly familiar with digital technology, and for repeat surveys, programming time should remain very low. Time is also a valuable resource.
Interviews on acceptability highlighted a number of relevant concerns around the use of digital data in fragile settings: threats to the person(s) carrying the tablet and respondent hesitations were discussed in equal measure. Similar concerns were raised by enumerators of the Ethiopian study (Zeleke et al., 2019), and by researchers in a South African study (Tomlinson et al., 2009). These issues raise the need to mitigate security risks for field staff using electronic devices in fragile settings and to pre-sensitize communities about safe storage of electronic data in these settings. Local leaders are often engaged prior to household surveys for a similar purpose, yet sharing in advance the news that field staff will pass through an area with valuable electronic devices may further expose them to risk (Tomlinson et al., 2009).
The efficiencies and user acceptance described above add evidence to the argument for further expansion of digital technology in survey settings, also those affected by insecurity. But total replacement of digital technology over paper is unlikely to occur quickly according to Adner and Kapoor’s ‘War Between Ecosystems’ paradigm, in which the two systems enjoy a ‘robust coexistence’ dynamic (Adner and Kapoor, 2016). This particular ecosystem enables both the piloting of digital systems where it is safe to do so, and the use of the ‘old technology’ of paper. Survey planners are well acquainted with the process of developing paper-based questionnaires and electronically entering paper data, which have been extensively documented in detailed methodological open-source tools and guidelines. Current trends in the technological and political environment suggest that Afghanistan will remain in quadrant two, meaning that digital technology is unlikely to replace paper in the coming years. Challenges abound: digital data collection issues like accessing stable internet connectivity, programming forms, and resolving technological errors usually require specialized knowledge and training. In addition, insecurity also undoubtedly slows down progress towards the next ecosystem evolution.
Limitations
A number of study limitations affected the results presented here. Because the study was conducted as an operational research add-on to an existing household survey, we did not prioritize randomization of household clusters to the CAPI and PAPI groups. Therefore we cannot attribute differences to the mode of collection. However the qualitative data does provide some evidence that improvements in timeliness can be attributed to CAPI, as data collectors found CAPI to reduce time spent per household. A systemic error affected the calculation of a question in the immunization module. An incorrect skip pattern in the paper questionnaire and subsequent incorrectly programmed skip logic in the electronic version meant that about 40% of responses were missed for this question. A similar mistake led to high missingness for a family planning question in the women’s module. Additionally, time stamps for coding and data entry for paper data were not logged using a precise and consistent approach across all DMU staff, so logged hours could have included non-coding or data entry tasks.
CAPI and PAPI cluster costs are also difficult to compare because of several factors. Because survey programming was carried out by developers with highly specialized skills, programming costs calculated for CAPI clusters far outweighed the overall data management costs of PAPI clusters. Also, an estimated 30% of CAPI programming time was spent on revisions, adding substantial costs which theoretically could have been avoided if all formatting specifications were clear upfront. But in practice achieving perfect tool design at the first opportunity is often unrealistic due to challenges which arise during tool design. Finally, due to the high upfront costs to program CAPI questionnaires, costs for CAPI data management and in total appear much higher than PAPI, although CAPI still offered considerable time savings. Caeyers et al. suggested a simple formula to calculate the threshold number of forms above which CAPI becomes cost-effective enough to justify programming expenses, and pointed out that if a repeated or modified version of the survey is anticipated, this should prompt planners to consider CAPI (Caeyers, Chalmers and Weerdt, 2010).
Furthermore, future studies would benefit from including four additional indicators: internal consistency of the final dataset, number of duplicate records, timeliness of data availability, and time for data coding and second-step (verification) entry of electronically collected data (Table 2). Internal consistency was not monitored using a systematic approach; however, we programmed skip logic and field data type constraints into the electronic questionnaire to eliminate user error. The comparison of data completeness and anecdotal evidence suggest that CAPI performed better than paper in this regard. For example, six households in paper clusters which did not report having any children under the age of five also contained data on under five illness, while none of the electronic cluster households showed such an inconsistency. 11 respondents from paper clusters responded that a child was born dead, but later responded that the same child is still alive, while no similar irregularities were observed in the electronic cluster data. The same pattern was found in the Ethiopian study, in which invalid entries or errors were more found in nearly half of the paper-based surveys compared to one-third of the electronically collected data (Zeleke et al., 2019).
Table 2. Operationalization table for evaluation of digital data collection, management and analysis.
The level of effort required to prepare CAPI for coding and verification deserves to be underscored. CAPI still required a substantial time investment so that survey modules could be merged for analysis. Due to user error during household data collection, incorrect line numbers were sometimes entered into some modules of the household interviews, resulting in errors during merging, and software-generated record duplication. Although these issues could eventually be resolved in two forty-hour work weeks, it was occasionally necessary to seek information from the field regarding specific households and individuals to merge survey modules correctly. Even in CAPI, human error in data entry will lead to logical impossibilities during merging, which different software packages will handle differently (Dickinson et al., 2019). Understanding how one’s preferred software will handle many-to-one or one-to-many merging errors is essential before using it to start managing survey data.