The present study addressed the quality of assessment items in sixth year PharmD clinical clerkships examinations. The study provided three interesting and valuable outcomes that can be of benefit to academic staff and preceptors. (1) The reliability of an examination correlated significantly with items psychometric parameters, (2) the Bloom’s levels associated with an item significantly affected its psychometric properties, and the (3) structure of an item and the number of options possessed by an item significantly affected the psychometric parameters of the item.
The predominant item format in the current study was case based, in which the basic level competencies; remembering and understanding skills, constituted the majority; around two-thirds, of the measured skills. These competencies are the foundation for the higher competencies levels (e.g. analysis, application and evaluation and creation skills). The use of case based items in the assessment of students in a health care professional program, such as a Pharm D program, is necessary. A case based item acts to introduce students to clinical scenarios that simulate patient situation, and enables them to practice decision making during realistic challenges.
Building case based items is a time consuming task and requires a knowledgeable and practice expert examiner [3, 7]. The psychometric parameters of the studied assessment items in our study showed their high quality with less than 8% classified as poor or flawed items [16, 23]. The benefits implied by the use of case based items and items psychometrics parameters, in addition to the high values of Cronbach’s Alpha of each examination evidenced the high reliability of exams under study [21].
Evaluation of the effect of competency levels on the difficulty of an item showed that items addressing analysis skills are more difficult; on the other end of the scale are knowledge and understanding skills which were much easier. These findings are in agreement with the findings of Kim and colleagues (2012), where they found that analysis and synthesis items are more difficult [24].
The evaluation on discrimination measures (discrimination and point biserial) of assessment items addressing remembering and understanding skills and analysis skills are more efficient in differentiating between students in upper and lower grade quartiles.
Analysis on difficulty, discrimination, and point biserial of item formats demonstrated no differences between case based and noncase based items in terms of difficulty and point biserial, and with a slightly better discrimination of noncase based items. These results are similar to that of Phipps and Brackbills (2009) findings [3], demonstrating comparable capability of these two item formats.
The type of an item has significant effect on its psychometric characteristics. Open ended type was easier, yet more discriminative; this tallies well with Thawabieh (2016) findings [19]. It is understood that the nature of open ended items allows for the incorporation of more details when answered by students, while utilizing higher thinking orders allows for better discrimination between high- and low-performance students. On the other hand, the options in MCQs may provide a hint to students on the item-writer intention [24].
The number of options an item possessed showed significant impact on both difficulty and discrimination levels measured as difficulty and point biserial, respectively. The higher the number of options the easier the item and more discriminative. This is in partial agreement with Phipps and Brackbills (2009) findings where they found that 5-options are more difficult and more discriminative. Despite that, they concluded that due to the very small differences between these two groups, it is explainable/justifiable to use a mix of 4 and 5 responses MCQs in exams [3].
Analyzing case based items and noncase based items separately revealed different behaviors. Case based items that are open ended are significantly easier and more discriminative than MCQs, while the same type of noncase based items is more difficult and more discriminative. This can be attributed to the fact that case based items provide scenarios that may simplify the item and guide the examinee but still need to be seen in context.
The number of answer options (4 or 5), had no effect in discrimination metrics of either case based or noncase based assessment items, and it only affected the difficulty of case based items, as 4-option questions were more difficult. The idea of writing more plausible and effective options other than the key answer when an item is based on a case that’s full of details is clearly more challenging and difficult.
Open ended items that are noncase based are more difficult and more discriminative in comparison with open ended that are case based. In addition, MCQs that are noncase based have larger discrimination and point biserial; showing that noncase based items are more discriminative. Again, case based items were shown to have similar, if not inferior, behavior to noncase items, limiting their benefit to their ability to address intended learning and course aims, but expressing no unique performance assessment characteristics.
One more result of the current study was the different effect of the item format on the characteristics of 4- and 5-option MCQs. Noncase based, 4-option MCQs items were significantly easier and more discriminative than case based 4-option MCQs. However, case based and noncase based 5-option MCQs items had no differences in difficulty and DI, and differed slightly in point biserial while remaining within the same recommended point biserial value range.
The previous results showed differences between the two MCQs groups yet cannot be conclusive, as it once again a very challenging time-consuming task not only to construct a case item but also to construct strong, reliable, and efficient choices during the creation of MCQs regardless the item format; being based on case or not.
In a study conducted by Sheaffer and Addo (2013), where they measure both second year Pharm D students’ performance and confidence in answering selected-response and constructed-response items, it was concluded that students performed better and felt more confident in answering selected-response items. Moreover, the incorporation of constructed-response teaching and testing method in pharmacy learning and education was recommended [13].
One important limitation of the present study is the unequal number of items per groups which may have affected the analysis. On the other hand, items classification based on the Bloom’s levels might be subjective [8]; for this reason, peer reviewers represented by both, preceptors as the personnel in direct connection with “real life” cases, and Academic staff/educators to minimize the controversy. Another issue worth mentioning is that our analysis is based on the Classical Test Theory; an alternative approach to evaluate the properties of items is the Item Response Theory which is based on the study of test and item scores based on assumptions concerning the mathematical relationships between abilities and item responses [26].