The present study addressed the quality of assessment items in sixth year PharmD clinical clerkships examinations. The study provided three interesting and valuable outcomes that can be of benefit to academic staff and preceptors. (1) The reliability of an examination correlated significantly with items psychometric parameters, (2) the Bloom’s levels associated with an item significantly affected its psychometric properties, and the (3) structure of an item and the number of options possessed by an item significantly affected the psychometric parameters of the item.
The predominant item format in the current study was case based, in which the basic level competencies; remembering and understanding skills, constituted the majority; around two-thirds, of the measured skills. These competencies are the foundation for the higher competencies levels (e.g. analysis, application and evaluation and creation skills). The use of case based items in the assessment of students in a health care professional program, such as a Pharm D program, is necessary. A case based item acts to introduce students to clinical scenarios that simulate patient situation, and enables them to practice decision making during realistic challenges.
Building case based items is a time consuming task and requires a knowledgeable and practice expert examiner [3, 7]. The psychometric parameters of the studied assessment items in our study showed their high quality with less than 8% classified as poor or flawed items [16, 23]. The benefits implied by the use of case based items and items psychometrics parameters, in addition to the high values of Cronbach’s Alpha of each examination evidenced the high reliability of exams under study .
Evaluation of the effect of competency levels on the difficulty of an item showed that items addressing analysis skills are more difficult; on the other end of the scale are knowledge and understanding skills which were much easier. These findings are in agreement with the findings of Kim and colleagues (2012), where they found that analysis and synthesis items are more difficult .
The evaluation on discrimination measures (discrimination and point biserial) of assessment items addressing remembering and understanding skills and analysis skills are more efficient in differentiating between students in upper and lower grade quartiles.
Analysis on difficulty, discrimination, and point biserial of item formats demonstrated no differences between case based and noncase based items in terms of difficulty, discrimination and point biserial. These results are similar to that of Phipps and Brackbills (2009) findings , demonstrating comparable capability of these two item formats.
The type of an item has significant effect on its psychometric characteristics. Open ended type was easier, yet more discriminative; this tallies well with Thawabieh (2016) findings . It is understood that the nature of open ended items allows for the incorporation of more details when answered by students, while utilizing higher thinking orders allows for better discrimination between high- and low-performance students. On the other hand, the options in MCQs may provide a hint to students on the item-writer intention .
The number of options an item possessed showed significant impact on difficulty and none on discrimination levels measured as discriminating index and point biserial. The higher the number of options the easier the item is and, slightly but not significantly, more discriminative. This is in partial agreement with Phipps and Brackbills (2009) findings where they found that 5-options are more difficult and more discriminative. Despite that, they concluded that due to the very small differences between these two groups, it is explainable/justifiable to use a mix of 4 and 5 responses MCQs in exams .
Analyzing case based items and noncase based items separately revealed different behaviors. Case based items that are open ended are significantly easier and more discriminative than MCQs, while the same type of noncase based items is more difficult and more discriminative. This can be attributed to the fact that case based items provide scenarios that may simplify the item and guide the examinee but still need to be seen in context.
The number of answer options (4 or 5), had no effect on discrimination metrics of either case based or noncase based assessment items, and it only affected the difficulty of case based items, as 4-option questions were more difficult. The idea of writing more plausible and effective options other than the key answer when an item is based on a case that’s full of details is clearly more challenging and difficult.
Open ended items that are noncase based are slightly difficult and more discriminative in comparison with open ended that are case based. In addition, MCQs that are noncase based have larger discrimination and point biserial; showing that noncase based items are more discriminative. Again, case based items were shown to have similar, if not inferior, behavior to noncase items, limiting their benefit to their ability to address intended learning and course aims, but expressing no unique performance assessment characteristics.
One more result of the current study was the comparable effect of the item format on the characteristics of 4- and 5-option MCQs. Noncase based, 4-option MCQs items were significantly easier than case based 4-option MCQs with similar discriminative power. However, case based and noncase based 5-option MCQs items had no differences in discrimination and differed slightly in difficulty as case based items are easier.
The previous results showed differences between the two MCQs groups yet cannot be conclusive, as it once again a very challenging time-consuming task not only to construct a case item but also to construct strong, reliable, and efficient choices during the creation of MCQs regardless the item format; being based on case or not.
In a study conducted by Sheaffer and Addo (2013), where they measure both second year Pharm D students’ performance and confidence in answering selected-response and constructed-response items, it was concluded that students performed better and felt more confident in answering selected-response items. Moreover, the incorporation of constructed-response teaching and testing method in pharmacy learning and education was recommended .
It is understandable in a study like ours that items classification based on the Bloom’s levels might be subjective ; we have attempted to minimize that by making use of the experience of clinical preceptors in direct contact with “real life” cases and academic staff/educators as peer reviewers of the studied items. Another issue of importance to consider in the present study is the fact that we had unequal number of items per rotation could have an effect on the analysis.
The current study based its analysis on the Classical Test Theory; it would be interesting and useful to attempt to utilize alternative approaches to evaluate the properties of items such as the Item Response Theory which is based on the study of test and item scores based on assumptions concerning the mathematical relationships between abilities and item responses . Another potentially useful analytical approach involves testing Bloom’s levels and item properties in the same model which would be attempted in future studies. It would also be of great value to include
One last important limitation of the current study is the use of Cronbach’s alpha as the only measure of exam internal consistency which could be affected the number of items in each of the tested examinations. An alternative approach would be the supplementation of Cronbach's alpha with other indices of internal consistency.