In this study, the question format affected the pass mark set by standard setting judges when they were given access to the answers (as is usual in standard setting practice). When standard setters were shown the answers for VSAQs, they produced a significantly lower pass mark for the VSAQ paper. It has been shown that students score an average 21 percentage points lower on VSAQs 3, and this study suggests that this is taken into account to some extent by standard setters with access to the answers who set an median pass mark of almost 14 percentage points lower for the VSAQs.
In addition, we investigated if there are different pass marks set when standard setting judges do or do not have access to the answers in VSAQs vs SBAQs. Standard setters who could see the answers set a lower median pass mark for the VSAQs. We hypothesise that this is related to having access to the range of accepted VSAQ answers, which gives a concrete indication of the degree of difficulty of the question. This is in contrast to studies with multiple choice questions, where it has been suggested that access to the answers is likely to cause judges to underestimate the difficulty of the question 12, or access to the answers made no difference to the standard set using the Ebel method 13.
A limitation of our study was that we were not able to hold a group discussion, as is considered best practice when standard setting for high stakes assessment. Group standard setting meetings result in sharing of information and discussion of questions that often results in constructive revision of scores 8. It has also been shown to improve method reliability and reduces the number of judges required 15. This is a valuable part of the process, especially when members of the standard setting panel may be less familiar with VSAQs, and so are likely to benefit from sharing of experience. Providing standard setters with question facility for VSAQs, or typical performance differences for VSAQs versus SBAQs, may also help when setting standards in this unfamiliar question type. This was not done in our study as the questions were used formatively so performance data is not likely to be a true reflection of summative assessment.
A further limitation of our study is the small sample size, which was due to finding suitable members of faculty within one institution. We limited eligiable participants to those who had experience in final medical school examinations and were involved in delivery the undergraduate currciculum, to ensure the highest quality judges in our standard setting panel. A future study of standard setters across a wider cohort of medical schools would allow a larger sample size and could also look at judges’ characteristics (average age, years standard setting, years spent in undergraduate teaching for example) that may affect the standards they set.
As far as we are aware, our study is the first to look at standard setting for VSAQs. It is also the first study to demonstrate the importance of the standard setting panel having access to the answers when scoring VSAQs. As VSAQs are increasingly introduced into undergraduate medical assessments, it opens the discussion for what must be considered when identifying the ideal standard setting method for this novel question format. Our study demonstrates the feasibility of using the Angoff method to standard set this novel question type in undergraduate medical education. In addition, it provides a platform for further research, including comparing other recognised methods of standard setting – the Cohen method which would consider setting a standard in relation to the performance of the cohort, and the Ebel method which asks judges to consider the importance of the knowledge tested as well as the difficulty of the question.