Summary of main results
Our simulations found that the Square method was the best, as measured by the lower ranks and RMSE ratios. Under some circumstances, the Peri and Quad approaches, which sample from two or four areas of the towns, respectively, perform quite well, better than the standard EPI method (confirming previous simulations [3]), but not as well as the GPS techniques. The NewEPI is mostly an improvement over the OldEPI, especially when estimating prevalence.
The superiority of the Square technique was confirmed by the estimates of RMSEs in relation to population parameters which revealed that there were no particular parameters (i.e., no population type) for which other methods were better.
Commentary
Some of the procedures proposed to overcome the known limitations of the original EPI (‘OldEPI’) were improvements, but had their own limitations. For example, our results suggest that the homogeneity within segments will produce relatively high design effects, and increasing sample size within the segments will produce only a modest improvement in precision. This applies also to the NewEPI approach, which in effect segments the whole population into small Enumeration Areas. Moreover, it requires prior knowledge of the EAs, and not just approximate data on each town’s population. It would be possible to adapt some approaches in the absence of information on the target population, for example, a recently formed informal refugee camp. Such a method could use drones or other technology to ensure any aerial images needed are up-to-date. This would be even more feasible if software can recognize buildings or tents on the ground.
We did not include simulations in which we varied the number of towns/EAs in our samples. Given our results, though, we can make some comments on the likely impact of doing that. Methods that take their samples from small areas are likely to have large design effects, especially if they take relatively large sample sizes per cluster.
One possible disadvantage of the Square method is that for larger towns there may be a good deal of travel required to reach all the sampled households, while EPI samples are in a small geographic area. Still, at times this may be an advantage if there are concerns about the security of interviewers. With the Square technique, the interviewers can enter and leave areas quickly, rather than spend time finding and interviewing several households in a small neighbourhood.
Strengths and limitations of our work
Our study has several strengths. We attempted to create realistic populations, whose characteristics varied across towns. We included multiple populations, which simulations using real data cannot. We included many sampling methods, including some that have been proposed but to our knowledge not used in practice, and the Square and Circle approaches which have not been included in previous simulations. We added the NewEPI method that has only recently been developed. Finally, to our knowledge only one previous study has simulated Relative Risks [9].
Of course, our study has limitations. The populations are created, and not real. The homogeneity in neighbourhoods was built in and may be greater than in real life. Still, similarity of nearby households is broadly realistic. We also note that our simulated samples were ideal and ignored logistical difficulties that real surveys experience [10]. These include the fact that population numbers are not exact (so PPS sampling is really PPES), interviewer teams make decisions that may not strictly follow protocols, and people in households may be out when interviewers call or may refuse to participate. Still, we expect such problems would in practice be similar for all or most sampling methods.
Other criteria for comparison
Several criteria apply when comparing sampling procedures [15]. While they were applied to new approaches using Geographic Information System (GIS) technology, most of the criteria are relevant here.
Coverage
The Square method relies on identification of buildings or households from aerial images. There are likely to be errors in such identification, both false positives (features misclassified as buildings) and false negatives (buildings misclassified and thus not included in the possible set of sampled buildings). Such errors are less likely with on-the-ground survey teams. The EPI methods, though, are likely to omit more isolated homes, unless they are the first household selected, since they will rarely be the next nearest neighbour.
Cost
If the main cost of a survey is travel to the cluster (town or EA), all methods will have similar cost. The Square method may require more travel within towns, which could be quite substantial – and costly - for large towns (cities).
Speed: Several stages of the surveys are common to all methods: obtaining population estimates for the clusters, conducting interviewing, cleaning and collating data, and conducting the analysis.
For the NewEPI, far more information (data on Enumeration Areas) is needed before the survey can be done. It may be readily available in official files, but access may take time. As well, survey teams must list all households in the cluster, and either select all of them or choose a random sample – this can be done concurrently with the interviews themselves. The NewEPI manual projects a 12-month timetable from conception of a survey to its completion [#:page, far longer than required in emergencies. The Square method requires obtaining aerial images, and identifying the sample from them. The older EPI methods require interviewer teams to identify buildings, which is likely to be fairly quick and it can be combined with the interviews in one trip to a cluster.
Degree of Interviewer Involvement
The OldEPI methods rely on interviewers to identify households in a random direction, randomly select one, and choose the next nearest households. All of these are subject to error. The NewEPI requires survey teams to list all households in an EA. Depending on how clear the boundaries are, error may arise. The Square method depends in part of the precision of the GPS locators, and how readily teams can identify the selected building.
Control over probabilities of selection
The probabilities of selection can be estimated for the GPS methods and the NewEPI, but not for the older EPI methods. The similar size of the EAs leads to similar sampling weights, and smaller standard errors of estimates than methods like the GPS techniques that can have quite different sampling weights depending on the variation in population density.
Technical GIS Skills: Some sources of information ‘require advanced training to use properly’ [15:70]. Since the methods we have described are readily available and easily usable today, this is not a concern. Indeed, as technologies improve, the accuracy of identifying buildings will also get better.