The number of confirmed COVID-19 cases divided by population size is used as a coarse measurement for the burden of disease in a population. However, this fraction depends heavily on the sampling intensity and the various test criteria used in different jurisdictions, and many sources indicate that a large fraction of cases tend to go undetected.
Estimates of the true prevalence of COVID-19 in a population can be made by random sampling. Here I use simulations to explore confidence intervals of prevalence estimates under different sampling strategies, exploring optimal sample sizes and degrees of sample pooling at a range of true prevalence levels.
Sample pooling can greatly reduce the total number of tests required for prevalence estimation. In low-prevalence populations, it is theoretically possible to pool hundreds of samples with only marginal loss of precision. Even when the true prevalence is as high as 10% it can be appropriate to pool up to 15 samples, although this comes with the cost of not knowing which patients were positive. Sample pooling can be particularly beneficial when the test has imperfect specificity can provide more accurate estimates of the prevalence than an equal number of individual-level tests.
Sample pooling should be considered in COVID-19 prevalence estimation efforts.