Main Findings and Implications
Our results show no difference in risk literacy between current students and current GPs. Although showing above-average test results, GPs lack understanding of the underlying success and failure probabilities of mammography screening and therefore cannot accurately relate these to their patients.
The first major finding was that GPs showed above-average test results on the BNT, but did not perform significantly better than medical students. GPs reached a mean risk literacy score of 2.33 points, which according to Cokely et al. [15] is not meaningfully better than the 2.2 points considered average for “highly numerate individuals.” The likelihood that a positive result on a diagnostic test is correct (the test’s positive predictive value) strongly depends on the prevalence of the disease. If prevalence is low, even the best diagnostic test will not be very effective. GPs require accumulated experience in accurate test interpretation and the corresponding risks. However, despite their work in settings with low prevalence for many diseases, GPs show no better risk literacy than do average medical students.
Although the ability to work with numbers is important for physicians, only a handful of studies have explored their numeracy skills. In these studies, physicians have repeatedly shown poor understanding of statistics and quantitative aspects of research [25,26,27,28]. They particularly lack the ability to apply Bayesian inferences, which are used to estimate probabilities of a hypothesis in the face of new data [29,30] [see also Wegwarth & Gigerenzer [31] for a good summary]. In the previously mentioned study by Gigerenzer et al. [16], most gynecologists grossly overestimated the probability of cancer after positive mammography screening results, and their probability estimates were highly variable. Not surprisingly, studies of medical students have shown similar results. In 2002, Sheridan and Pignone provided 62 first-year medical students with three numeracy questions associated with information about the baseline risk for developing a hypothetical disease [32]. They reported that almost one-quarter of the students had trouble performing basic numerical operations. Although showing a relatively high level of numeracy (77% answered all questions correctly), only 61% of students correctly interpreted quantitative data for treatment of a hypothetical disease. The authors concluded that medical students do not understand statistical concepts well, and our findings suggest that this conclusion also applies to fully trained physicians.
We found that GPs performed poorly in applying numeracy skills in the given case scenario. Nearly three-quarters of the GPs entered values of >80%, a response quite different to the real predictive value. Risk literacy correlated significantly and positively with the ability to solve the case scenario. This is not only a problem with physicians. In the above-mentioned mammography study by Schwartz et al., even the most arithmetically able patients made accurate assessments of risk less than half the time [20]. In reality, physicians and patients are often confronted with additional irrelevant screening evidence, such as improved 5-year survival and increased early detection. Wegwarth et al. found that unfortunately “many physicians did not distinguish between irrelevant and relevant screening evidence” [33], a situation that almost certainly hampers understanding of the benefits and risks of screening. Finally, it can be concluded that GPs lack risk literacy and therefore do not fully understand the numeric estimates of probability associated with routine screening procedures.
In a shared decision-making approach, treatment decisions are made jointly by doctors and patients [34], with the patients’ wishes and needs regarding the therapy receiving considerable weight. This is the core element of patient-centered medicine [34]. Adequate communication of risks of disease and treatment modalities is an important factor in ensuring successful decision-making. Decision-making based on correct understanding of the underlying risks and benefits is central to the physician–patient relationship and is therefore an essential determinant of patient satisfaction and compliance [11,35]. We were able to show that physicians clearly do not have the necessary numeracy to interpret cancer screening statistics. As Moyer (2012) states, “Expecting them to communicate this information to patients is a stretch.” [36].
Training in risk literacy may help to enhance students’ numeracy skills, but this sometimes does not seem to be the case [37]. GPs who successfully solved the case scenario attained significantly higher BNT scores (3.00 points) than those who did not, a finding reflected in the positive correlation between solving the extra scenario and BNT scores. According to Cohen, a correlation coefficient of >.30 reflects a medium effect size [38]. Therefore, the BNT can help to identify individual weaknesses in numeracy. The BNT is a rapid measure of individual deficiencies in risk literacy and could be used to enhance the effectiveness of training. Several tools exist to increase understanding of risks [39,40]. However, the introduction of such new training to an already crowded medical studies curriculum remains a challenge.
The most promising way forward is the use of alternative statistical formats, particularly natural frequencies (such as 100 of 1,000 instead of 10%) for presenting the risks in screening tests. Therefore, developers of patient decision aids are recommended to preferentially use a natural frequency format for test and screening decisions [41]. The natural frequency method is very effective (Cohen’s d effect size .69; 95% CI 0.45–0.93) and should be used with patients and medical doctors [42]. Authors of publications about screenings should be encouraged by medical journal editors to ensure that the results are presented in terms of natural frequencies to avoid misinterpretation. For all other cases, students should be trained to generate natural frequencies by themselves from provided data.
Limitations
The main limitation of this study is the difference in questionnaire return rates between GPs and students. Whereas over 90% of the medical students returned questionnaires, far fewer GPs did so. Because the population of interest was a large, geographically dispersed one, a postal questionnaire was our only financially viable option. Although we tried to increase the response rate [43], only about 10% of the contacted GPs replied. Non-response reduces the effective sample size and may have introduced selection bias.
Selection bias severely limits the interpretation of empirical findings. The baseline characteristics provided no indication of systematic differences between the study populations and our analyzed samples. It is likely that if selection bias did occur in this study, it worked against the reported findings; that is, it is probable that GPs who answered the postal survey were much more confident about their numeracy skills than those who did not. Our results may therefore actually overestimate GPs’ competence in risk literacy and their understanding of routine screening procedures.