We focus on eliciting expert information about the anticipated performance of the BIMC test and the Awaji criteria. These elicitations took place in 2017, prior to the updated Gold Cost criteria (Shefner et al, 2020), and so will reflect the knowledge and understanding of the experts at that time.
2.1 Participants
A total of seven experts in Neurology and MND participated in the elicitations for the model parameters. Three experts were directly involved in the development of the BIMC test, and the design of the trial (Experts 5-7). Four further experts were recruited through referrals from the first three experts, as well as a request sent to the British Society for Clinical Neurophysiology (BSCN) mailing list (Experts 1-4). These experts were recruited solely for their opinion on the expected performance of the diagnostic tests.
Background information about each expert is provided in Table 2.
Table 2. Expert demographics
Expert ID
|
Job title
|
Background
|
Experience (Years)
|
Self-reported strengths and weaknesses
|
1
|
Neurology Registrar
|
Molecular genetics and physiology
|
5
|
Unfamiliar with BIMC prior to elicitation
|
2
|
Senior Clinical Lecturer & Hon Cons Clinical Neurophysiologist
|
Biomarkers research
|
5
|
Familiar with previous work on BIMC
|
3
|
Consultant
|
Clinical studies
|
10
|
-
|
4
|
Consultant Clinical Neurophysiologist
|
|
-
|
Some experience with BIMC
|
5
|
Consultant Neurologist/Clinical Neurophysiologist
|
Basic pathophysiology
Clinical electrodiagnostics
|
8-10
|
Published literature and have personal clinical experience with BIMC test
|
6
|
Consultant Neurophysiologist
|
Research experience in upper motor neuron dysfunction in MND
|
10
|
Good background knowledge of MND/coherence and electrophysiology, but distant from front-line neurology care
|
7
|
Consultant Neurologist
|
Neuroscience research and clinical trials
|
25
|
Strong clinical experience, less up-to-date lab-based research
|
2.2 Expert Elicitation
Expert elicitation is the process of quantifying expert judgements, knowledge, and experience into probability distributions (Bojke, 2021). It is often used when specifying a prior distribution in Bayesian statistics and can be used in both the design and analysis of a trial.
Expert judgments from multiple experts can be combined to produce a distribution representative of a wider range of views. Studies suggest that these aggregated distributions tend to perform better than individual experts in terms of informativeness, which measures the amount of uncertainty represented by a distribution, and calibration, which measures how close the distribution is to the truth (Flandoli, 2011).
We aggregated expert judgements using two extensively validated methods. The Classical Method (CM) asks experts to make judgments on a series of seed questions, for which answers are known to the elicitors but not the experts. These are used to score and weight the experts in a mathematical aggregation (Cooke, 1988). Experts who perform better at probability specification in the seed questions receive higher weight when aggregating the judgements for the quantities of interest. The Sheffield Elicitation Framework (SHELF) provides a structure for a group of experts to discuss the quantities of interest and form an aggregated distribution themselves (O’Hagan, 2019). Experts individually respond to the elicitation questions, and the responses are shared anonymously with the group. Then, as a group, the experts determine the final set of judgments to represent what a rational impartial observer would conclude having heard all of the individual judgements.
2.3 Elicitation Implementation
Two rounds of elicitations were held. The first was carried out in March 2019 and included the three experts involved in the BIMC study. This in-person elicitation meeting used the SHELF format, involving both an individual and a group elicitation stage. The individual elicitations were completed online before the group meeting. The second elicitation took place over the course of 2020 and involved an updated version of the individual elicitation used during the first round. The documents used as part of the elicitation are provided in Appendix A.
During both rounds of elicitation, two types of quantity were elicited. The first type was the five parameters relating to the BIMC trial. The second was responses to seed questions, elicited as part of the CM. In this paper, we focus on the first type, with the second provided in the supplementary material. For both groups, the minimum, 25% quantile, median, 75% quantile, and maximum were elicited for each quantity of interest.
The SHELF aggregation combined the views of the three experts involved in the trial design (labelled Experts 5,6, and 7), while the CM aggregation combined views from all available experts.
Experts 1 and 4 only provided estimates for the parameters relating to the Awaji criteria’s performance. As such, their prior distributions could only be used for estimating values related to the Awaji criteria, and not the performance of BIMC or sample sizes.
2.4 Bayesian Sample Size Calculations
Assurance is the probability that a trial will result in a successful outcome and can be used analogously to statistical power in determining an appropriate sample size (O’Hagan, 2005). As a Bayesian method, assurance requires a prior distribution to represent the available knowledge prior to the trial. Mathematically, the assurance can be calculated as
$$Assurance= \int P\left(Successful Outcome \right| {\theta }\left) P\right({\theta })d\theta$$
where \(P\left(Successful Outcome \right| {\theta })\) is the probability of a successful outcome (such as the null hypothesis being rejected), given a set of parameter values \({\theta }\), and P(\({\theta })\) is the prior distribution for the parameters, representing the current state of knowledge.
In this case, the elicited values are used as a basis for defining the prior distribution. After eliciting the quantile values, a Beta distribution was fitted to each parameter for each expert.
Assurance calculations take into account the intended primary analysis of the trial. The analysis planned involved the use of McNemar’s test (McNemar, 1947). A Bayesian alternative to this test was constructed to allow for further comparison between power and assurance.
The log-ratio between two binomial distributions, \({X}_{1} \tilde Binomial({n}_{1}, {p}_{1})\) and \({X}_{2} \tilde Binomial({n}_{2}, {p}_{2})\), can be approximated by a normal distribution (Katz et al., 1978).
$$\text{log}\left(\frac{{X}_{1}}{{X}_{2}}\right)\tilde Normal\left(\frac{{p}_{1}}{{p}_{2}},\frac{{1-p}_{1}}{{n}_{1}}+\frac{{1-p}_{2}}{{n}_{2}}\right)$$
By setting X1 to be the number of positive Awaji test results at the first time point, X2 to
be the number of positive BIMC test results at the first time point, p1 = η, and p2 = ηθ1 + (1- η) φ θ 2 +(1- η)(1- φ) θ 3, the log-ratio then can be used to make inferences about the difference between the proportions of individuals diagnosed by the two tests.
Unlike statistical power, the maximum value assurance can take depends on the prior distribution. Assurance can be standardised across prior distributions by dividing by the maximum possible value, which rescales assurance to be between zero and one. This is referred to as scaled assurance (Alhussain & Oakley, 2020). The scaled assurance can be interpreted as the percentage of total assurance achievable. We focus on calculating scaled assurance to allow for comparisons across different prior distributions.