There are uncertainties inherent to a scientific measurement. And there are statistical uncertainties from meta-analysis of repeated studies. The resolution of a scientific method becomes negligible when the resolution exceeds that required to answer the scientific question. But for somewhat blunt measuring devices like opinion polling, the resolution cannot be treated as a negligible variable. Polling uncertainty therefore contains two components: the uncertainty of the polling method itself, and the statistical uncertainty of repeated measurements.
Polls were widely criticized following the 2016 US General Election. A report from the American Association for Public Opinion Research (AAPOR, 2017) noted that “[t]he day after the election, there was a palpable mix of surprise and outrage directed towards the polling community, as many felt that the industry had seriously misled the country about who would win.” However, state-level polls had correctly predicted the outcome for individual states—but only when the statistical confidence interval lay outside the method-specific uncertainty interval. When confidence intervals overlapped with the uncertainty interval, the results were uncertain: both candidates won some of the uncertain state-level contests.
Prior to the 2016 election, we selected ± 3.5% to reflect recent polling errors. For example, Obama beat his polls by 3 points in 2012, and Republicans beat their polls by 4% in the 2014 midterms. Although this was informed by historical datapoints, one limitation of this study was that selecting the method-specific uncertainty interval involved a subjective element. Jennings and Wlezien (2018) subsequently performed a more sophisticated historical analysis of polling errors. Based on 175 presidential elections, they estimated a mean absolute polling error of 2.7 percentage points. This analysis focused on national-level polls and are not directly applicable to sub-national polls that we refer to as state-level polls. Since the analysis by Jennings and Wlezien (2018) spanned multiple countries, additional research may be required to answer whether the uncertainty interval should be tailored to a specific country or system of election.
Polls attempt to infer population-level attributes from a subset of the population. Once concern is whether the error rates of modern polls are stable. Jennings and Wlezien (2018) found that he election year had a trivial, non-significant effect (P = 0.85) when modeling the absolute error as a dependent variable. This indicates that there has been no discernible decline in the accuracy of polls over time and that it is feasible to set a generalizable method-specific uncertainty interval. However, the selection of uncertainty intervals may still vary by country, polling frequency quality or polling method, and they do not necessarily need to equal the mean absolute polling error.
The goal of this study was to illustrate that a method-specific uncertainty interval is distinct from and complementary to statistical measures of uncertainty. The uncertainty intervals provide a strategy to assess the reliability of the data that is independent of the predictive models that are then applied to those data. Predictions themselves can be improved in a number of ways. This study did not involve sophisticated predictive models. For simplicity, this analysis treated all state-level contests as digital, even though some states do not award all electoral votes to the candidate who wins the majority of votes in that state. Predictions could also be improved by selecting measures of central tendency other than the mean, and by estimating statistical confidence intervals using non-parametric methods like the bootstrap. For simplicity, this study extracted 95% confidence intervals from a function for a two-sided t-test. These are parametric estimates that assume normally distributed data, an assumption that is easily violated.
This study illustrates a broader limitation of the polling field: the paucity of reliable polling data for every state. Although there was enough data to limit the 2016 analysis to high-quality polls, the 2020 analysis was forced to include a number of polls that were graded D or D-, including polls conducted over the internet. Note that specifically excluding polls rated D or D- did not remove ungraded polls. This was intentional because the timeline for posting a grade and the criteria for an ungraded poll remained unclear to us. We decided to include all polls because the following states or districts did not have enough reliable polls to eliminate poor quality polls: Arkansas, Connecticut, District of Columbia, Delaware, Hawaii, Idaho, Illinois, Indiana, Louisiana, Nebraska, North Dakota, Oregon, Rhode Island, South Dakota, Tennessee, Utah, Vermont, Washington, West Virginia, Wyoming. This finding highlights a need for additional, quality polling data so that the certainty can be more reliably addressed. It is notable that poor-quality polls were noted in the American Association for Public Opinion Research report on the 2016 election polling (AAPOR, 2017). AAPOR proposed several solutions, and yet the number of high-quality state-level polls has subsequently decreased to the point that we were forced to include poor-quality polls in some states in 2020.
Many pollsters use predictive models to estimate probabilities. Predictive models often include their own probability estimates that are based on permutations of the existing data. Permutations and models are reliable, however, only when projections capture the uncertainty of both the underlying data and the uncertainty of the model itself. Modelling can be used to determine an error rate, which in turn can be used to calculate probabilistic forecasts. But modelling cannot generate a forecast that has more certainty than the aggregate uncertainty of the underlying data. As data modeling plays a more central role in science and society, we need to emphasize the distinction between the underlying data and the models that are derived from them. Polling data should not necessarily be blamed for the incorrect, or overly confident, modeling predictions. In 2016, some models claimed a certainty of 99% (AAPOR, 2017) even though this analysis showed that 30.7% of the electoral-college vote was too close to call. More importantly, no candidate could claim with certainty to reach 270 electoral college votes. The 2016 results were uncertain; it would have been more accurate to report these as too close to call. We cannot claim to make highly certain predictions from highly uncertain data.
In contrast to 2016, the 2020 polling data provide confidence that one candidate will reach the minimum of 270 votes required to win the US electoral college. Although 139 (23.4%) of the 538 available electoral votes are classified as uncertain, the data themselves support more confident predictions because one candidate has reached the minimum requirement of 270 votes in the electoral college based on state-level results that are outside the uncertainty interval.
The goal here is to identify when the underlying data are too uncertain to claim highly certain predictive models, something that is often obscured even from those who develop the models.
Conversely, confidence in the data is not an endorsement of the predictive models themselves. The fact that the data support a certain prediction does not imply that every prediction made using those data will be valid. Variability between models can be attributed to different modeling methods, training methods, decisions used to select training and input data, and the methods used to determine the probability of those results. All of these differences may be scientifically valid, and they may lead to differing predictions.
We need to improve the way we explain uncertainty: uncertain data are not wrong, only uncertain. The unappealing statistical reality is that polls are sometimes too close to call. While general and sophisticated consumers alike often bristle when data scientists present an uncertain conclusion, it is a disservice to make overly confident predictions on inherently uncertain data. Leading up to the 2016 election, many pollsters understood that the election was close, and that there was a high degree of uncertainty. But most did a poor job of articulating that uncertainty. Simply reporting margins “±3%” does not adequately convey the underly uncertainty. In cases where the underlying data does not support a confident prediction, the general public’s confidence in social sciences is bolstered when scientists can recognize and communicate simply that an election result is too close to call. As the complexity of data models increase, data scientists can help non-specialists become better consumers of data models. When cases of uncertainty arise, we have an opportunity to educate consumers about the limitations of data measurements and data modeling.