Standard Setting Very Short Answer Questions (VSAQs) Relative to Single Best Answer Questions (SBAQs): Does Having Access to the Answers Make a Difference?

doi:10.21203/rs.3.rs-1431427/v1

Download PDF

Research Article

Standard Setting Very Short Answer Questions (VSAQs) Relative to Single Best Answer Questions (SBAQs): Does Having Access to the Answers Make a Difference?

https://doi.org/10.21203/rs.3.rs-1431427/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

Weinvestigated whether question formatand access to the correct answers affect the pass mark set by standard-setters on written examinations.

Methods

Trained educatorsused the Angoff method to standard set two 50-item tests with identical vignettes, one in a single best answer question (SBAQ) format (with five answer options) and the other in a very short answer question (VSAQ) format (requiring free text responses). Half the participants had access to the correct answers and half did not. The data for each group were analysed to determine if the question format or having access to the answers affected the pass mark set.

Results

A lower pass mark was set for the VSAQ test than the SBAQ testby the standard setters who had access to the answers (median difference of 13.85 percentage points, Z=-2.82, p=0.002). Comparable pass marks were set for the SBAQ test by standard setters with and without access to the correct answers (60.65% and 60.90% respectively).A lower pass mark was set for the VSAQ testwhen participants hadaccess to the correct answers (difference in medians -13.75 percentage points, Z=2.46, p=0.014).

Conclusions

When given access to the potential correct answers, standard setters appear to appreciate the increased difficulty of VSAQs compared to SBAQs.

Assessment

undergraduate

standard setting

Single Best Answer Questions (SBAQs) are widely used in medical assessment including high stakes licensing exams such as the US Medical Licensing Examination, membership examinations of many of the UK Royal Colleges, and final examinations of UK medical schools. However there has been criticism of this question format, for being subject to cueing and not reflecting real life clinical practice ^1,2.

Very Short Answer Questions (VSAQs) are a novel assessment method that has been proposed as a solution to this problem. Like SBAQs, VSAQs have a clinical vignette followed by a lead-in question. However instead of having a list of answer options to choose from, the candidate provides their own answer of between one and five words in length. The candidate’s answers are marked against a set of preapproved answers ^2–6 and, due to recent advances in technology, they can be delivered and marked electronically ^3–6. Any answers that do not match the preapproved options can then be reviewed to consider if they should be marked correct and added to the future lists of approved answers ^2–6. The cueing associated with SBAQs is mitigated with VSAQs as the answer options are removed ³. Student performance in exams has been shown to be affected by using this question format, indicating that students find this question type more challenging ^3–6. VSAQs have been shown to be a better representation of candidates’ unprompted level of knowledge, with a recent study showing that the average student scored 21 percentage points lower on the VSAQ compared to the SBAQ of the same stem ³.

There is considerable variation in the means of assessment and methods of standard setting across medical schools ^7–9. Whilst several standard setting methods have been examined empirically for SBAQs ¹⁰, standard setting methodology for VSAQs has not been studied to date. As VSAQs have been successfully introduced into undergraduate assessment ^{3–6, 11}, the question of how to set the pass mark for this assessment method needs to considered.

It has previously been shown that standard setting estimates for SBAQs are significantly affected by a judge knowing, or not knowing the answer to the item ¹². Verheggen et al. found that a judge’s knowledge of the subject and their stringency as a judge impacted on the standard set for an item and therefore the standard was not purely a reflection of the difficulty of the item¹². As VSAQs are designed to have a range of accepted answers, provision of these for the judges when standard setting may be even more significant. Bourque et al (2020) looked at standard setting SBAQs for a national postgraduate exam (using the Ebel method) and found no difference in scoring regardless of whether the answers were provided to the judges or not¹³.

Using a set of common stem items, we set out to study whether the question format (VSAQs versus SBAQs) affects the pass mark. We also investigated whether having access to the answers had an effect on the pass marks set for both SBAQ and VSAQ formats.

Two 50-question assessment papers were created using the same question vignettes, one paper using the VSAQ and the other paper in an SBAQ format with five answer options. These items had previously been used in a formative assessment of 1,417 volunteer final year medical students ³. The papers were standard set using the Angoff method¹⁴.

Twenty three teaching faculty from Imperial College School of Medicine were trained on standard setting in undergraduate examinations using the Angoff method, through a face-to-face workshop. This allowed participants to arrive at a common understanding of what constitutes a borderline candidate. Participants were randomised into four groups to standard set the papers in different formats as per Table 1.

Table 1

Session Design by Group
	Group A (n = 6)	Group B (n = 6)	Group C (n = 6)	Group D (n = 5)
Pre-session training	Face to Face workshop: Standard Setting in Undergraduate Examinations.
Session 1	50 SBAQs (with answers)	50 VSAQs (with answers)	50 SBAQs (without answers)	50 VSAQs (without answers)
Session 2	50 VSAQs (with answers)	50 SBAQs (with answers)	50 VSAQs (without answers)	50 SBAQs (without answers)

Eleven partcipants judged the paper without access to the answers, as the student would see it, and twelve received the correct answers, as typically happens in standard setting practice (Fig. 1).

In order to account for the impact of the order in which standard setters saw VSAQs and SBAQs half the groups saw VSAQs first before SBAQs and the other half saw SBAQs before VSAQs. A washout period of six weeks between session 1 and session 2 was created to prevent standard-setters being subject to cueing from previous standard setting sessions.

Data Collection

Each participant was asked to standard set the paper using the Angoff method by judging each item in the paper using the question “What proportion of consistently just safe newly qualified doctors would get the question correct? Scores were submitted using an electronic survey tool. For the VSAQs they were able to submit a response between 0 and 100%. For the SBAQs this was adjusted to a response between 20 and 100% to allow for the 20% chance that the candidate can guess the correct answer from the five options provided with no prior knowledge.

Data Analysis

The anonymised standards data for each participant were downloaded from the electronic survey tool into an Excel file. The data were transferred into Stata V.16 for analysis. The mean of each participant’s standard from the Angoff method for the 50 items was calculated to produce their overall pass mark. For each group the median of all members’ pass marks was calculated for the VSAQ exam and the SBAQ exam.

To determine if the question format influenced the standard set, a paired-data Wilcoxon Sign Rank test on participants’ overall pass marks by question format was carried out separately for the participants that had the question answers (groups A & B) and the participants that did not (groups C & D). To determine if having access to the answers influenced the standard set, an independent samples Mann Whitney U test was carried out separately on the overall pass marks set for the VSAQ and for the SBAQ papers (Groups A + B vs. Groups C + D). For all statistical significance testing a critical p-value of 0.0125 was used given the use of multiple comparisons.

Ethical approval for the study was granted by the Imperial College London Medical Education Ethics Committee (MEEC) (MEEC1920-178).

Question format

Table 2 presents summary statistics comparing standards set for the SBAQ and VSAQ formats of the assessment, with and without the answers. There was a large and statistically significant difference between VSAQ and SBAQ pass marks set by the groups that had access to the answers. The SBAQ pass mark was set higher than the VSAQ pass mark, with a median difference of 13.85 percentage points (Z=-2.82, p = 0.002). There was not a statistically significant difference between the VSAQ and SBAQ pass marks set by the groups that did not have access to the answers (median difference − 1.90 percentage points, Z = 0.45, p = 0.700).

Access to Answers

For VSAQs, having the answers resulted in a statistically significant reduction in the pass marks set, as shown in Table 2 (difference in medians − 13.75 percentage points, Z = 2.46, p = 0.014). For SBAQs, having the answers did not make a significant difference to the pass mark set (difference in medians − 0.25 percentage points, Z = 0.06, p = 0.952).

Table 2

Summary Statistics Comparing Standards Set for SBAQs and VSAQs
	Median SBAQ	Median VSAQ	Median Difference (SBAQ-VSAQ)	Wilcoxon Sign Rank p-value; Z-score
Answers	60.65	49.95	13.85	p = 0.002; Z-score = -2.824
No-Answers	60.90	63.70	-1.90	p = 0.700; Z-score = 0.445
Difference in Medians (Answers - No-Answers)	-0.25	-13.75
Mann Whitney U; p-value; Z-score	p = 0.952; Z-score = 0.062	p = 0.014; Z-score = 2.462

In this study, the question format affected the pass mark set by standard setting judges when they were given access to the answers (as is usual in standard setting practice). When standard setters were shown the answers for VSAQs, they produced a significantly lower pass mark for the VSAQ paper. It has been shown that students score an average 21 percentage points lower on VSAQs ³, and this study suggests that this is taken into account to some extent by standard setters with access to the answers who set an median pass mark of almost 14 percentage points lower for the VSAQs.

In addition, we investigated if there are different pass marks set when standard setting judges do or do not have access to the answers in VSAQs vs SBAQs. Standard setters who could see the answers set a lower median pass mark for the VSAQs. We hypothesise that this is related to having access to the range of accepted VSAQ answers, which gives a concrete indication of the degree of difficulty of the question. This is in contrast to studies with multiple choice questions, where it has been suggested that access to the answers is likely to cause judges to underestimate the difficulty of the question ¹², or access to the answers made no difference to the standard set using the Ebel method ¹³.

A limitation of our study was that we were not able to hold a group discussion, as is considered best practice when standard setting for high stakes assessment. Group standard setting meetings result in sharing of information and discussion of questions that often results in constructive revision of scores ⁸. It has also been shown to improve method reliability and reduces the number of judges required ¹⁵. This is a valuable part of the process, especially when members of the standard setting panel may be less familiar with VSAQs, and so are likely to benefit from sharing of experience. Providing standard setters with question facility for VSAQs, or typical performance differences for VSAQs versus SBAQs, may also help when setting standards in this unfamiliar question type. This was not done in our study as the questions were used formatively so performance data is not likely to be a true reflection of summative assessment.

A further limitation of our study is the small sample size, which was due to finding suitable members of faculty within one institution. We limited eligiable participants to those who had experience in final medical school examinations and were involved in delivery the undergraduate currciculum, to ensure the highest quality judges in our standard setting panel. A future study of standard setters across a wider cohort of medical schools would allow a larger sample size and could also look at judges’ characteristics (average age, years standard setting, years spent in undergraduate teaching for example) that may affect the standards they set.

As far as we are aware, our study is the first to look at standard setting for VSAQs. It is also the first study to demonstrate the importance of the standard setting panel having access to the answers when scoring VSAQs. As VSAQs are increasingly introduced into undergraduate medical assessments, it opens the discussion for what must be considered when identifying the ideal standard setting method for this novel question format. Our study demonstrates the feasibility of using the Angoff method to standard set this novel question type in undergraduate medical education. In addition, it provides a platform for further research, including comparing other recognised methods of standard setting – the Cohen method which would consider setting a standard in relation to the performance of the cohort, and the Ebel method which asks judges to consider the importance of the knowledge tested as well as the difficulty of the question.

The potential benefit of integrating VSAQs into undergraduate medical assessments has already been demonstrated, so it follows that a validated standard setting method is needed if they are to be used in high stakes examinations. To our knowledge this is the first study comparing standard setting with and without the answers in VSAQs. Further research on a larger scale is warranted to determine if the effect we have seen persists in a larger and more varied population of standard setters. Based on the present study, we recommend that answers should be provided to the standard setters to help them arrive at a valid standard.

VSAQs

Very Short Answer Questions

SBAQs

Single Best Answer Questions

Ethics approval and consent to participate

Ethical approval for the study was sought from and granted by the Imperial College London Medical Education Ethics Committee (MEEC) on 22/11/2019 (ref number: MEEC1920-178).

All methods were performed in accordance with the Declarations of Helsinki and informed consent was obtained from all participants for the study.

Consent for publication

Not applicable

Availability of data and materials

The datasets used andanalysed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests

Funding

Not applicable

Authors' contributions

AHS, KRM and CAB designed and delivered the standard setting sessions. KRM and CAB carried out the statistical analysis, with support from AHS. All authors were involved in the overall design of the study, and all provided major contributations to writing the manuscript. All authors reviewed and approved the final manuscript.

Acknowledgements

Not applicable

Veloski JJ, Rabinowitz HK, Robeson MR, Young PR. Patients don’t present with five choices: An alternative to multiple- choice tests in assessing physicians’ competence. Acad Med. 1999;74(5):539–46.
Sam AH, Hameed S, Harris J, Meeran K. Validity of very short answer versus single best answer questions for undergraduate assessment. BMC Med Educ. 2016;16(1):266.
Sam AH, Westacott R, Gurnell M, Wilson R, Meeran K, Brown C. Comparing single-best-answer and very-short-answer questions for the assessment of applied medical knowledge in 20 UK medical schools: Cross-sectional study. BMJ Open. 2019;9(9).
Sam AH, Peleva E, Fung CY, Cohen N, Benbow EW, Meeran K. Very Short Answer Questions: A Novel Approach To Summative Assessments In Pathology. Adv Med Educ Pract. 2019;Volume 10:943–8.
Sam AH, Fung CY, Wilson RK, Peleva E, Kluth DC, Lupton M, et al. Using prescribing very short answer questions to identify sources of medication errors: A prospective study in two UK medical schools. BMJ Open. 2019;9(7).
Sam AH, Field SM, Collares CF, van der Vleuten CPM, Wass VJ, Melville C, et al. Very-short-answer questions: reliability, discrimination and acceptability. Med Educ. 2018;52(4):447–55.
MacDougall M. Variation in assessment and standard setting practices across UK undergraduate medicine and the need for a benchmark. Int J Med Educ. 2015 Oct 31;6:125–35.
Yeates P, Cope N, Luksaite E, Hassell A, Dikomitis L. Exploring differences in individual and group judgements in standard setting. Med Educ. 2019 Sep 2;53(9):941–52.
Taylor CA, Gurnell M, Melville CR, Kluth DC, Johnson N, Wass V. Variation in passing standards for graduation-level knowledge items at UK medical schools. Med Educ. 2017 Jun 1;51(6):612–20.
Bandaranayake RC. Setting and maintaining standards in multiple choice examinations: AMEE Guide No. 37. Med Teach. 2008 Jan 3;30(9–10):836–45.
Sam AH, Hameed S, Harris J, Meeran K. Validity of very short answer versus single best answer questions for undergraduate assessment. BMC Med Educ. 2016;16(1):266.
Verheggen MM, Muijtjens AMM, Van Os J, Schuwirth LWT. Is an Angoff Standard an Indication of Minimal Competence of Examinees or of Judges? Adv Heal Sci Educ 2006 132. 2006 Oct 17;13(2):203–11.
Bourque J, Skinner H, Dupré J, Bacchus M, Ainslie M, Ma IWY, et al. Performance of the Ebel standard-setting method in spring 2019 royal college of physicians and surgeons of canada internal medicine certification examination consisted of multiple-choice questions. J Educ Eval Health Prof. 2020 Apr 20;17.
McKinley DW, Norcini JJ. How to set standards on performance-based examinations: AMEE Guide No. 85. Med Teach. 2014 Feb 20;36(2):97–110.
Fowell SL, Fewtrell R, McLaughlin PJ. Estimating the Minimum Number of Judges Required for Test-centred Standard Setting on Written Assessments. Do Discussion and Iteration have an Influence? Adv Heal Sci Educ. 2008 Mar 7;13(1):11–24.

No competing interests reported.

Download PDF

Editorial decision: Major revision
16 Jun, 2022
Reviews received at journal
08 Apr, 2022
Reviewers agreed at journal
24 Mar, 2022
Reviews received at journal
23 Mar, 2022
Reviewers agreed at journal
21 Mar, 2022
Reviewers invited by journal
21 Mar, 2022
Editor assigned by journal
14 Mar, 2022
Editor invited by journal
11 Mar, 2022
Submission checks completed at journal
11 Mar, 2022
First submitted to journal
08 Mar, 2022

You are reading this latest preprint version

Standard Setting Very Short Answer Questions (VSAQs) Relative to Single Best Answer Questions (SBAQs): Does Having Access to the Answers Make a Difference?

Status:

Version 1

Abstract

Figures

Background

Methods

Data Collection

Data Analysis

Results

Question format

Access to Answers

Discussion

Conclusions

Abbreviations

Declarations

References

Additional Declarations

Status:

Version 1