The present study addressed the quality of sixth year PharmD final examinations. It provided three interesting and valuable outcomes that can be of benefit to academic staff and preceptors. (1) The reliability of an examination correlated significantly with items psychometric parameters, (2) multiple Bloom’s levels associated with an item affected the discriminative power of the item, and the (3) structure of an item and the number of options possessed by an item significantly affected the psychometric parameters of the specified item.
The predominant item format in the current study was case based, in which the basic level competencies; knowledge and understanding skills, constituted the majority; around two-thirds, of the measured skills. These competencies are the foundation for the higher competencies levels (e.g. analysis and patients-specific skills). The use of case based items in the assessment of students in a health care professional program, such as a Pharm D program, is necessary. It acts to introduce students to clinical scenarios that simulate patient situation, and enables them to practice decision making during realistic challenges.
Building case based items is a time consuming task, it requires a knowledgeable and practice expert examiner [2, 4]. The psychometric parameters of the studied assessment items in our study showed their high quality with less than 8% classified as poor or flawed items [13, 20]. The benefits implied by the use of case based items and items psychometrics parameters, in addition to the high values of Cronbach’s Alpha of each examination evidenced the high quality and reliability of exams under study [17].
Evaluation of the effect of competency levels on the difficulty of an item shows that items addressing intellectual and analysis skills are more difficult, on the other end of the scale are knowledge and understanding skills which were much easier. This is understandable since the students who don’t have the basic knowledge cannot derive correct responses. These findings are in agreement with the findings of Kim and colleagues (2012), where they found that analysis and synthesis items are more difficult [21]. However, the mentioned study showed that the association of items with multiple concepts affect item difficulty. In our study, items with a single concept showed significant effect on DIF I. this probably because we are assessing both MCQs and open-end items while in their study MCQs were only included.
The evaluation on discrimination measures (DI and PBS) of assessment items addressing different competency levels indicates that items associated with multiple levels (knowledge and understanding items and Intellectual and analysis items) can differentiate significantly between students in upper and lower grade’ quartiles. These results are comparable to Kim and colleagues (2012) findings that items associated with multiple functions; measure more than one thinking order, are more discriminative [21].
Analysis on DIF I, DI, and PBS of item formats demonstrated no differences between case based and noncase based items in terms of DIF I and PBS, and with a slightly better DI of noncase based items, DI difference was relatively small. These two DI values are still within the same "good" category of DI. These results are similar to that of Phipps and Brackbills (2009) findings [2], demonstrating comparable capability of these two item formats.
The type of an item has significant effect on its psychometric characteristics. Open ended type was easier, yet more discriminative, this is similar to Thawabieh (2016) findings [16]. It is understood that the nature of open ended items allow for the incorporation of many details when answered by students, while utilizing multiple and higher thinking orders allowing for better discrimination between high- and low-performance students. On the other hand, the options in MCQs may provide orientation for students to detect item-writer intention [21].
The number of options an item possessed showed significant impact on both difficulty and discrimination levels measured as DIF I and PBS, respectively. The higher the number of options the easier the item and more discriminative. This is in partial agreement with Phipps and Brackbills (2009) findings where they found that 5-options are more difficult and more discriminative. Despite that, they concluded that due to the very small differences between these two groups, it is explainable/justifiable to use a mix of 4 and 5 responses MCQs in exams. Also, in their study they analyzed both A-type and K-type MCQs [2].
Analyzing case based items and noncase based items separately revealed different behaviors. Case based items that are open ended vs. MCQs, are significantly easier and more discriminative, while the same type of noncase based items is more difficult and more discriminative. This can be attributed to the fact that case based items provide scenarios that may simplify the item and guide the examinee but still need to be seen in context.
The number of answer options (4 or 5), had no effect in discrimination metrics of either case based or noncase based assessment items, and it only affected the difficulty of case based items, as 4-option questions were more difficult. The idea of writing more plausible and effective options other than the key answer when an item is based on a case that’s full of details is clearly more challenging and difficult.
Open ended items that are noncase based are more difficult and more discriminative in comparison with open ended that are case based. In addition, MCQs that are noncase based have larger DI and PBS; showing that noncase-based items are more discriminative. Again, case based items were shown to have similar, if not inferior, behavior to noncase items, limiting their benefit to their ability to address intended learning and course aims, but expressing no unique performance assessment characteristics.
One more result of the current study was the different effect of the item format on the characteristics of 4- and 5-option MCQs. Noncase based, 4-option MCQs items were significantly easier and more discriminative than case based 4-option MCQs. However, case based and noncase based 5-option MCQs items had no differences in DIF I and DI, and differed slightly in PBS while remaining within the same recommended PBS value range.
The previous results showed differences between the two MCQs groups yet cannot be conclusive, as it once again a very challenging time-consuming task not only to construct a case item but also to construct strong, reliable, and efficient choices during the creation of MCQs regardless the item format; being based on case or not.
In a study conducted by Sheaffer and Addo (2013), where they measure both second year Pharm D students’ performance and confidence in answering selected-response and constructed-response items, it was concluded that students performed better and felt more confident in answering selected-response items. Moreover, the incorporation of constructed-response teaching and testing method in pharmacy learning and education was recommended [8].
One important limitation of the present study is the unequal number of items per groups which may have affected the analysis. On the other hand items classification based on the Bloom’s levels might be subjective [5]; for this reason peer reviewers represented by both, preceptors as the personnel in direct connection with “real life” cases, and Academic staff/educators to minimize the controversy.
The findings of our study, uncovered an important issue; specifically, our students are prepared and trained to deal with real cases situations, and whether the instructors and the used teaching and learning methods do initiate and develop their student’s skills and abilities to do so. One other question is important to answer, can and do instructors assess the fact that the used teaching methodology/methodologies are tailored with the type and structure of exams. Is it would seem that instructors/examiners receive quality training in how to construct an examination/item [6]. Many pedagogical aspects and evaluation perspectives should be included in the assessment tools evaluation.