Main Findings and Implications
Our results show no difference in risk literacy between current students and current GPs. Although showing above average test results, GPs lack understanding of the underlying success and failure probabilities of mammography screening and therefore will not be able to relate these to their patients accurately.
The first major finding of our study is that GPs showed above-average test results on the BNT, but no significant advantage over medical students could be detected. GPs reached an average risk literacy score of 2.33 points, which according to data of Cokely et al. [15] is not meaningfully better than the 2.2 points considered average for “highly numerate individuals.” The likelihood that a positive result on a diagnostic test is correct (the test’s positive predictive value) strongly depends on the prevalence of the disease. If prevalence is low, even the best diagnostic test will not be very effective. GPs require accumulated experience in accurate interpretations of tests and the resulting risks under these conditions. But, despite their work in settings with low prevalence for many diseases, GPs show no better risk literacy than do average medical students.
Although the ability to work with numbers is important for physicians, only a handful of studies have explored their numeracy skills. In these studies, physicians have repeatedly been shown to have poor understanding of statistics and quantitative aspects of research [25-28]. They are particularly lacking in the ability to apply Bayesian inferences that are used to estimate probabilities of a hypothesis in the face of new data [29,30] [see also Wegwarth & Gigerenzer [31] for a good summary]. In the previously mentioned study by Gigerenzer et al. [16], the majority of gynecologists grossly overestimated the probability of cancer after positive results in mammography screening, and their probability estimates were highly variable. Not surprisingly, studies among medical students have shown similar results. In 2002, Sheridan and Pignone provided 62 first-year medical students with three numeracy questions associated with information about the baseline risk for developing a hypothetical disease [32]. They reported that almost one-quarter of first-year medical students study had trouble performing basic numerical operations. Although showing a relatively high level of numeracy (77% could answer all questions correctly), only 61% of all students correctly interpreted quantitative data for treatment of a hypothetical disease. The authors concluded that medical students do not understand statistical concepts well, and our study suggests that this conclusion also applies to fully trained physicians.
As a consequence, GPs performed poorly in applying numeracy skills in the given case scenario. Nearly three-quarters of the GPs entered values of > 80%, which denotes a conclusion quite disparate from the real predictive value. Risk literacy correlated significantly and positively with the ability to solve the case scenario. This is not only a problem with physicians. In the mammography study by Schwartz et al. mentioned earlier, even the most arithmetically able patients made accurate assessments of risk less than half the time [20]. In reality, physicians and patients are often confronted with additional irrelevant screening evidence such as improved 5-year survival and increased early detection. Wegwarth et al. demonstrated the unfortunate finding that “many physicians did not distinguish between irrelevant and relevant screening evidence” [33], a situation that almost certainly hampers understanding of benefits and risks of screening. Finally, it can be concluded that GPs lack risk literacy and in consequence do not fully understand the numeric estimates of probability associated with routine screening procedures.
In a Shared Decision Making approach, the treatment decision is made jointly by the doctor and the patient [34], with the patient’s wishes and needs about the therapy receiving considerable weight. This is the core element of patient-centered medicine [34]. Adequate communication of risks of disease and treatment modalities is an important factor for ensuring successful decision-making. Decision-making based on correct understanding of the underlying risks and benefits is central to the physician-patient relationship and is therefore an essential determinant of patient satisfaction and compliance [11,35]. We were able to show that physicians clearly do not have the necessary numeracy to interpret cancer screening statistics themselves. As Moyer (2012) states, “Expecting them to communicate this information to patients is a stretch.” [36].
Training in risk literacy may help to enhance students’ numeracy skills, but this sometimes does not seem to be the case [37]. GPs who successfully solved the case scenario attained significantly higher BNT scores (3.00 points) than participants who did not. This is also reflected in the positive correlation of solving the extra scenario with BNT scores. According to Cohen, a correlation coefficient of > 0.3 reflects a medium effect size [38]. So, the BNT can help to identify individual weaknesses in numeracy. The described BNT is a quick questionnaire for detecting individual deficiencies in risk literacy; therefore, it may enhance the effectiveness of trainings. Several tools exist to increase understanding of risks. [39,40] However, the introduction of this new training in an already crowded medical studies curriculum remains a challenge.
The most promising way forward is the use of alternative statistical formats, in particular, natural frequencies (such as 100 of 1000 instead of 10%) – for presenting the risks for screening tests. In consequence, developers of patient decision aids are recommended to preferentially use a natural frequency format for test and screening decisions [41]. The natural frequency method is very effective (effect size Cohen’s d 0.69; 95% CI 0.45–0.93) and should be applied with patients and medical doctors [42]. Authors of publications about screenings should be encouraged by medical journal editors to ensure that results are presented in natural frequencies to avoid misinterpretation. For all other cases, students should be trained to generate natural frequencies from offered data by themselves.
Limitations
The main limitation of this study is the different return rates of the questionnaire by GPs and students. Whereas over 90% of the medical students were recruited successfully, far fewer of the GPs completed the questionnaire. Because the population of interest was a large, geographically dispersed one, the postal questionnaire was our only financially viable option. Although we tried to increase the response rate [43]. only about 10% of the contacted GPs replied. Non-response reduces the effective sample size and may have introduced selection bias.
Selection bias is a severe limitation for interpretation of empirical findings. Baseline characteristics give no indication of systematic differences between the study populations and our analyzed samples. In our study, it is likely that if selection bias occurred, it would work against the reported findings; i.e., there is every chance that GPs who answered the postal survey were much more confident about their numeracy skills than were those who did not. Our results may therefore actually overestimate the competence of GPs in risk literacy and their understanding of routine screening procedures.