COVID-19 prevalence estimation by random sampling in the population - Optimal sample pooling under varying assumptions about true prevalence

DOI: https://doi.org/10.21203/rs.3.rs-32082/v1

Abstract

Background

The number of confirmed COVID-19 cases divided by population size is used as a coarse measurement for the burden of disease in a population. However, this fraction depends heavily on the sampling intensity and the various test criteria used in different jurisdictions, and many sources indicate that a large fraction of cases tend to go undetected.

Methods

Estimates of the true prevalence of COVID-19 in a population can be made by random sampling. Here I use simulations to explore confidence intervals of prevalence estimates under different sampling strategies, exploring optimal sample sizes and degrees of sample pooling at a range of true prevalence levels.

Results

Sample pooling can greatly reduce the total number of tests required for prevalence estimation. In low-prevalence populations, it is theoretically possible to pool hundreds of samples with only marginal loss of precision. Even when the true prevalence is as high as 10% it can be appropriate to pool up to 15 samples, although this comes with the cost of not knowing which patients were positive. Sample pooling can be particularly beneficial when the test has imperfect specificity can provide more accurate estimates of the prevalence than an equal number of individual-level tests.

Conclusion

Sample pooling should be considered in COVID-19 prevalence estimation efforts.

Full Text

This preprint is available for download as a PDF.