The survey opened in September 2023 and closed in January 2024. We obtained 101 responses (Table 1), with wide representation across stage of career (24% professors; 51% senior lecturers; 22% junior statisticians) and areas of work (28% early phase trials; 44% drug trials; 38% health service trials). The majority of participants were from a UK Clinical Trials Unit (66%), and the Australian Clinical Trials Alliance (20%), with a smaller number from The Canadian Trial Statisticians group (5%), and the Statistical Society of Australia (9%). Item-missingness was < 1% for the closed-ended questions (see Tables for actual numbers). For the open-ended questions we received a total of 317 free text responses. Of these 153 free-text responses (see tables for numbers of unique responders) were assessed as providing substantive information and included in the thematic analysis.
Themes related to difficulties when interpreting statistical findings
All of the free-form text responses and associated themes for this section are provided in full in Supplementary Table 1 (which contains 51 quotes from 51 participants) and the key themes summarised in Table 5. The key themes were culture (with subthemes: academic culture, review culture, and academic team culture), knowledge (with subthemes: knowledge of statisticians, knowledge of clinicians) and the construct of the clinically important difference (with subthemes: difficulty in determining the minimally important difference, the importance of patient and public involvement, and the feasibility of defining these concepts).
Table 5
Main themes emerging from the free-form text responses to questions on barriers and facilitators to interpretation with respect to preferences
Theme | Sub-theme | Explanation | Example quotes |
Culture | Culture of academia (barrier) | Pressure to publish in high impact journals and create an apparent positive impact, contributes to the misinterpretation of statistical findings | “The problems associated with significance testing are deeply ingrained in the academic community, and I suspect it will require a generational change (at least) to improve” “Researchers may do better in their careers and gain power and influence if they have a significant result. It is hard to counteract this bias - I think we might have to remodel society and the human mind…” “Politics and issues around peer-reviewed journals is a big player in this. The need to publish in high-ranking journals for career progression.” |
| Culture of the review process (barrier) | An overly strong editorial review process can lead to statistical findings being misinterpreted | “The main difficulty is that journal editors require the result to be described using yes/no language (even while the journal simultaneously claims to support avoidance of hypothesis testing/p-values!).” “Major journals have long held up the erroneous "strong steer" that the findings of a trial should be reduced to a mechanical yes/no according to whether p < .05 for the primary outcome. This has had widespread pernicious effects.” |
| Culture of the review process (facilitator) | The editorial review process, when working well, can help set appropriate boundaries with respect to interpretation of statistical findings. | “Teams often want a significant p-value, and in these cases it is often the journal editor who comes to our rescue and insists that the message is toned down.” ““It is normally the case that the reviewer asks to tone down the finding.” |
| Academic team culture (facilitator) | When academic teams work well, the collaboration between different disciplines can create a balanced interpretation of the study findings. | “The best trials I have worked on have this equal balance, and I feel that supports the scientific, statistical, and operational aspects of the trial best.” |
| Academic team culture (barrier) | When academic teams exhibit power imbalance, the knowledge of the statistician can get overlooked | “Often you are the only statistician in a room of excitable clinicians. It can be difficult to hold your ground.” “All depends on how valued the statistician's opinions are by the TMG/CI” “So education, yes, but also I think the 'power balance' in the study team is also really important.” |
Knowledge | Lack of engagement / knowledge by statistician (barrier) | The statistician might not always be fully invested in the research question and can become disengaged or can lack full knowledge to make an appropriate contribution | “Clinically important differences are provided to me. I don't determine them.” “There is too often an assumption that it's the fault of the journals or of the clinicians (only), but the statisticians are the ones who have taught and promoted misleading ideas in the past.” |
| Misunderstanding by clinicians (barrier) | Clinicians can lack appropriate statistical knowledge to interpret the statistical findings without full support | “Clinicians tend to interpret the results on their own without stats support and this could lead to inaccurate conclusions. In addition, they concentrate on p-values rather that the MCID and clinical importance of the results as it was predefined in power calculations.” |
Clinically important differences | The difficulty in determining clinically important differences (barrier) | Determining minimally important differences can be challenging | “There are also no 'minimum important differences' in the literature for my clinical colleagues to reference when interpreting binary outcomes, so they fall back on interpreting whether p is less than 0.05 or whether confidence intervals include zero.” |
| The patient role in determining clinically important differences (facilitator) | The identification of a minimally important difference requires patient input | “…With binary outcomes - sometimes PPI input can be more important ie what would be an important change for them.” |
| Minimally important differences can be very small and vary for different people (barrier) | In some settings the minimally important difference can be so small as to not be feasible to evaluate in a conventional trial and they can also vary for different people and in different contexts | “I think considerations of minimum important difference are subtle - for a patient a very small effect may be important. I think it's natural to power trials for effects that are 'worth trialling' rather than are minimum important effect for patients.” |
<
The first theme identified relates to an underlying academic culture, whereby participants expressed a view that there was an entrenched way of thinking that would require an entire culture shift to change, being strongly entwined with conflicts of interests and human nature:
“The problems associated with significance testing are deeply ingrained in the academic community, and I suspect it will require a generational change (at least) to improve” [Professor, ACTA-STInG, full-scale randomised trials]
“Researchers may do better in their careers and gain power and influence if they have a significant result. It is hard to counteract this bias - I think we might have to remodel society and the human mind.” [Professor, UKCRU, full-scale, pilot, healthcare and drug trials]
The second theme identified relates to the culture of review process. For example, some commented on how the review process created a hinderance:
“The main difficulty is that journal editors require the result to be described using yes/no language (even while the journal simultaneously claims to support avoidance of hypothesis testing/p-values!).” [Professor, CANSTAT, full-scale drug trials]
And others commented on the pressure to publish definitive findings with a clear, rather than uncertain, message:
“Politics and issues around peer-reviewed journals is a big player in this. The need to publish in high-ranking journals for career progression.” [Senior statistician, UKCRU, early-phase and full-scale trials]
However, some participants expressed the opinion that the editorial and review process had helped set appropriate boundaries around interpretation:
“Teams often want a significant p-value, and in these cases it is often the journal editor who comes to our rescue and insists that the message is toned down.” [Professor, UKCRU, full-scale, pilot, healthcare and drug trials]
The third theme identified relates to academic team culture. Participants reported that a positive team dynamic can help:
“The best trials I have worked on have this equal balance, and I feel that supports the scientific, statistical, and operational aspects of the trial best.” [Senior statistician, UKCRU, early-phase and full-scale trials]
Others reported power imbalances within study teams that at times had undermined the ability to have a balanced interpretation:
“Often you are the only statistician in a room of excitable clinicians. It can be difficult to hold your ground.” [Senior statistician, UKCRU, full-scale health research trials]
The second theme revolved around knowledge, in particular, the lack of balance from the academic team culture, being especially an issue when there was a misunderstanding of statistics and p-values by clinicians:
“Clinicians tend to interpret the results on their own without stats support and this could lead to inaccurate conclusions. In addition, they concentrate on p-values rather that the MCID and clinical importance of the results as it was predefined in power calculations.” [Senior statistician, UKCRU, all trial types]
However, lack of engagement or knowledge by statisticians could also be a problem:
“There is too often an assumption that it's the fault of the journals or of the clinicians (only), but the statisticians are the ones who have taught and promoted misleading ideas in the past.” [Professor, SSA, full-scale health-service research trials]
The final theme identified relates to the construct of clinically important differences. Participants expressed opinions around the difficulties of determining minimally clinically important effects in practice, and particularly how it can be difficult for binary outcomes:
“There are also no 'minimum important differences' in the literature for my clinical colleagues to reference when interpreting binary outcomes, so they fall back on interpreting whether p is less than 0.05 or whether confidence intervals include zero.” [ACTA-STInG, junior statistician, full-scale trials]
Participants also expressed how patients have an important role in determining what size of effects are important in practice:
“…With binary outcomes - sometimes PPI input can be more important i.e. what would be an important change for them.” [Senior statistician, UKCRU, pilot and full-scale drug trials]
Finally, it was also noted that the construct of a minimally important difference might not be an absolute construct, to the extent that sometimes a very small effect can be clinically important (even if this might be beyond a size which can be detected in a clinical trial with reasonable power, also see Supplementary Table 1):
“I think considerations of minimum important difference are subtle - for a patient a very small effect may be important. I think it's natural to power trials for effects that are 'worth trialling' rather than are minimum important effect for patients.” [Professor, UKCRU, full-scale health research trials]