Multiple-choice questions in preclinical and clinical phase examination for undergraduate 1 medical program in a Malaysian medical school: A 2-year analysis

2 Background: Multiple-choice questions (MCQs) are used in measuring the student’s progress 3 and post-examination analysis is usually done to guarantee the item’s appropriateness for 4 question banking. Item analysis is a simple and effective method to determine the quality of MCQs 5 by using three parameters; difficulty or passing index (PI), discrimination index (DI) and distractor 6 efficiency (DE). 7 Methods: This study analysed the MCQs in the preclinical and clinical examinations in Doctor of 8 Medicine program of Universiti Putra Malaysia. Forty MCQs consisted of four options each in the 9 preclinical examination and 80 MCQs with five options from the clinical examination paper in 2017 10 and 2018 were analysed and compared. 11 Results: The mean DI was similar in all examinations, except a significant reduction in 2018 12 clinical examination. From 2017 and 2018, preclinical MCQs showed an increment in the number 13 of ‘excellent’ and ‘good’ items. However, clinical papers showed reduction in DI due to high 14 number of ‘poor’ questions. Comparing both years, there was an increase in the number of items 15 with no non-functioning distractors in both examinations. Among all, preclinical MCQs in 2018 16 showed the highest mean of DE as compared to the others. 17 Conclusion: Our findings suggested that item authors from preclinical phase showed an 18 improvement in constructing good quality MCQs while clinical phase authors may need more 19 training and continuous feedback . Higher number of options did not affect the level of difficulty of 20 a question however the discrimination power and effectiveness of distractors might differ.


Doctor of Medicine (MD) curriculum in Universiti Putra Malaysia (UPM) consists of two phases; 2
pre-clinical and clinical phases which runs for 2 and 3 years, respectively. Students will sit for a 3 summative assessment at the end of each phase. Students must pass the preclinical examination 4 to proceed to clinical phase. In the fifth year, they need to sit and pass the clinical phase 5 examination before awarded with the degree of medicine. In the written examination, various 6 assessment tools are used such as MCQs, short answer questions and modified essay questions. 7 In any medical examinations, MCQ is one of the most important well-established written 8 assessment tools widely used for its distinct advantage and ability to evaluate over a broad 9 coverage of concepts in less time. The scoring is also objective and reliable [1]. The type A MCQ 10 item (single best response) consists of a 'stem' or 'vignette', followed by a 'lead in' statement and 11 several options. The correct answer in the list of options is called as a 'key' and the incorrect 12 options are called 'distractors'. 13 The validity of an assessment refers to the evidence presented to support or to refute the 14 meaning or interpretation assigned to the assessment data [2]. Validity, therefore, is a degree to 15 which the test measures what it is supposed to measure. This includes test item analysis which 16 is usually done after the assessment has completed to determine the candidate responses to 17 individual test items, the quality of those items as well as the overall assessment. Difficulty index 18 or passing index (PI), discrimination index (DI) and distractor efficiency (DE) of each item can be 19 obtained from the analyses which reflect the quality of the test items. The PI of an item is 20 commonly defined as the percentage of students who answered the item correctly. The DI, on the 21 other hand, is defined as the degree to which an item discriminates between students of high and 22 low achievement. The DE is used to assess the credibility of the distractors in an item, whether 23 they are able to distract students from selecting the right answer [3]. Any distractor that is selected 24 by less than 5% of the students is considered to be a non-functional distractor (NFD). The 25 advantages of the analysis are to help identify faulty items, identify the lower performers and their 1 learning problems such as misconceptions as a guide for remedial actions to be done to students 2 and as importantly to increase educator's skills to construct a high quality of test items. 3 For this reason, we analysed the MCQs given in the preclinical and clinical phase 4 examinations in the year of 2017 and 2018. A comparison was made between PI, DI and DE of 5 the items between the examinations for both years as well as between the two phases of the 6 examinations. Good quality items and revised items shall be stored in the question bank and faulty 7 items shall be discarded based on the obtained findings.

Item analysis 21
Post-validation was done automatically by item analysis using the OMR machine (Scantron 22 iNSIGHT 20 OMR scanner, Minnesota USA). The scores of all students in each examination 23 paper were arranged in order of merit. The upper 27% students were considered as 'top' students 1 and lower 27% students as 'poor' students. Each item was analysed for difficulty and 2 discrimination indices according to Hassan & Hod [4] and Abdul Rahim [5] : The interpretation of PI and DI values are presented in Table 1. Non-functional distractor (NFD) is the option that was selected by less than 5% of 18 students. Based on NFDs in an item, DE ranges from 0% to 100%. If an item with four 19 options contains three or two or one or nil NFDs, then DE would be 0, 33.3%, 66.7% and 20 100.0%, respectively. If an item with five options contains four or three or two or one or nil 1 NFDs, then DE would be 0, 25%, 50%, 75% and 100%, respectively. A total of 240 MCQs were analysed and the average PI and DI were determined. Overall, it was 8 found that the difficulty level of the questions was similar in both preclinical and clinical phase 9 examinations (Table 2)  There was an increment in the total percentages of 'excellent' and 'good' items in 7 preclinical phase examination in 2018, from about 38% to 70% (Table 4). In clinical phase 8 examination of 2017, half of the questions were 'excellent' and 'good'. However, the percentage 9 reduced to 36% in 2018 due to the high percentage of 'poor' questions in the examination (53%).

9
Additionally, there were five questions with zero DI in the paper; one with PI equals to 1 and 1 another with PI equals to 0.  The total number of non-functioning distractors (NFD) was reduced in 2018 in both 7 examination phases (Table 5)  The end of phase examination in UPM MD program is a high-stake summative assessment at the 5 end of preclinical and clinical phase. For preclinical phase examination, the results determine 6 whether the preclinical students are eligible to progress to clinical phase, while the final year 7 students need to pass the clinical phase examination to graduate. Therefore, valid assessment 8 tools are needed to measure students' knowledge, skills and attitude in the examination. One of 9 the tools used to test the 'knows' and 'knows how' in Miller's pyramid is with MCQ [7]. It is useful 10 in measuring factual recall, but it can also test higher order of thinking skills such as application, 11 analysis, synthesis and evaluation of knowledge, which are important for medical graduates. Post-1 validation of test items using item analysis of PI, DI and DE is a simple, yet effective method to 2 assess the validity of the test. In the present study, we analysed the MCQs from both preclinical 3 and clinical phase examinations taken by two different cohorts of students. Each MCQ in 4 preclinical phase examination has four options while clinical phase examination has five options. 5 Based on the findings, the mean PI in both examinations in both years were similar. This 6 indicates that an increased number of options, five versus four options, does not have a significant 7 impact on the difficulty level of the examination. This was supported by previous study by Schneid 8 et al.
[8] which found that there was no significant differences in the difficulty level among MCQs 9 with three, four or five options. In contrary, Vegada et al. [9] found a slight decreased in the 10 difficulty level when reducing the options from five to four, and the items became much easier 11 when reducing the options to only three. They concluded that the items became easier with less 12 number of options due to the increased probability of random guessing to select the correct 13 answer. 14 Preclinical phase examination showed a consistent level of item difficulty for both years. 15 Yet, half of the questions were 'very easy' and 'difficult'. These questions seem to be unsuitable 16 to assess students in the high-stake examination as they were unable to discriminate between 17 the good and the weak students. Hence, these questions should be revised, by changing either 18 the vignette or the options. All difficult questions should be reviewed for its language and 19 grammar, ambiguity and controversial statements [ suggested that a continuous training and feedback should be given to the question authors so 1 that the number of 'difficult' and 'very easy' questions can be reduced in the future 2 The mean DI for preclinical phase examination in both years, and 2017 clinical phase 3 examination ranging from 0.25 to 0.31 which were considered as 'good'. An earlier study had 4 shown that a comparable mean of DI proved that the quality of questions has been consistent 5 along the years [4]. Nevertheless, the mean DI in clinical phase examination reduced significantly 6 in 2018. This may be due to the high number (66%) of 'very easy' and 'difficult' questions in the 7 examination. Consequently, about 53% of the questions were considered as 'poor' and were not 8 able to discriminate between the good and weak students. A test item ideally should be able to 9 pick out the 'good' students from the 'poor' ones, in which more 'good' students are able to answer 10 the item as compared to the 'poor' students [5]. In the present study, some questions were found 11 to have zero and negative DI. Zero DI means that the item was non-discriminating in which either 12 all students were able to answer the item correctly, or an equal number 'good' and 'poor' students 13 were able to answer correctly, or none of the 'good' and 'poor' students managed to correctly 14 answer it. Negative DI indicates that more 'poor' students were able to answer the item correctly. 15 In this study, one question was found to have zero DI and zero PI. These demonstrated an 16 extremely low number of students who managed to answer it correctly and none of them were 17 the 'good' and 'poor' students. We speculate the reasons for these were due to ambiguous 18 framing of the questions, poor preparation of students or perhaps, wrong answer key was given 19 [10, 13, 14]. One other question has zero DI and PI equals to 1 demonstrating that all students 20 managed to answer it correctly, probably because it was too easy. Too difficult and too easy 21 questions may contribute to the 'poor' questions based on the dome-shaped correlation between 22 PI and DI [12,15]. These questions were not useful and may reduce the validity of the test, 23 therefore should be eliminated. is also needed before the questions are used in an examination. With this analysis, feedback 10 should be given to all authors for them to reflect upon and revise their questions accordingly. 11 The number of NFDs also affects the discrimination power of an item [11]. From this study, A meta-analysis by Rodriguez [16] found that having three options in an item is adequate. 23 Even though the difficulty level is lowered, but it is more discriminating and more reliable. This is 24 supported by a more recent study which found that questions with even as low as three options 25 would still produce good reliability and less laborious to construct [17]. However, this means that 1 students will have high chance (only 1 in 3) of correctly answering the item with random guessing. 2

Royal & Dorman [18] highlighted that 3-option and 4-option MCQs had similar psychometric 3
properties which means the former is equally effective as the latter. Therefore, the traditional 4 MCQs with 4 options shall be maintained as more research needs to be done to better understand 5 the effects of 3-option MCQs on guessing strategies and cut score determination decisions to 6 avoid any unintended consequential validity [19]. 7 The present study highlights some interesting findings. First, an increased number of Several limitations should be noted in this study. First, this study was confined to one 3 educational setting, limiting its generalization. Any effort to infer the findings to other educational 4 settings needs to be done with caution. Second, several other parameters such as internal 5 consistency and correlation between PI and DI were not measured in this study. Lastly, some 6 variables such as author's previous training on writing MCQs and students' characteristics were 7 not controlled during the analysis which may affected the study findings. 8 9

Conclusion 10
The findings suggest standardizing the number of options to only four as it does not much affect 11 the difficulty level of the questions but will improve the degree of the items to discriminate between 12 high and low achievers. This will also ease the authors on writing MCQs with equally plausible 13 distractors. More trainings are required for the authors, especially from clinical phase to improve 14 the quality of the items as seen in preclinical authors. Feedback should be given to all authors 15 after analysis for them to reflect and make improvement. Good quality items shall be stored in the 16 question bank while the poor ones shall be discarded. The data in the present study are available from the corresponding author on reasonable request. 9

Acknowledgement 1
The authors wish to thank the Dean and Deputy Dean (Academic) of Faculty of Medicine and 2 Health Sciences, UPM for their support and guidance. Also, appreciations to the staff in Academic