Evaluation of the Effect of Items’ Format and Type on Psychometric Properties of Sixth Year Pharmacy Students Clinical Clerkship Assessment Items

doi:10.21203/rs.2.17768/v2

Download PDF

Research article

Evaluation of the Effect of Items’ Format and Type on Psychometric Properties of Sixth Year Pharmacy Students Clinical Clerkship Assessment Items

https://doi.org/10.21203/rs.2.17768/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 12 Jun, 2020

Read the published version in BMC Medical Education →

You are reading this older preprint version

Read the latest preprint version →

Background Examinations are the traditional assessment tools. In addition to measurement of learning, exams are used to guide the improvement of academic programs. The current study attempted to evaluate the quality of assessment items of sixth year clinical clerkships examinations as a function of assessment items format and type/structure and to assess the effect of the number of response choices on the characteristics of MCQs as assessment items . Methods. A total of 173 assessment items used in the examinations of sixth year clinical clerkships of a PharmD program were included. Items were classified as case based or noncase based and as MCQs or open-ended. The psychometric characteristics of the items were studied as a function of the Bloom’s levels addressed, item format, and number of choices in MCQs. Results . Items addressing analysis skills were more difficult. No differences were found between case based and noncase based items in terms of their difficulty, with a slightly better discrimination in the latter . Open-ended items were easier, yet more discriminative. MCQs with higher number of options were easier and more discriminative. Open-ended questions were significantly easier and more discriminative in comparison to MCQs as case based items while they were more difficult and more discriminative as noncase based items. Conclusion. Item formats, structure, and number of options in MCQs significantly affected the psychometric properties of the studied items. Noncase based items and open-ended items were easier and more discriminative than case based items and MCQs, respectively. Examination items should be prepared considering the above characteristics to improve their psychometric properties and maximize their usefulness.

Internal Medicine

Assessment Items

Clinical Clerkships

Difficulty Index

Discriminating Index

Point Biserial

Examinations are the traditional evaluation method of students’ performance used by instructors throughout educational history [1]. Good quality examinations are essential for generating reliable data to measure student learning, guide program improvements and provide stakeholders with relevant information [2]. This places a particularly significant responsibility on educators attempting to develop appropriate examinations’ items [3].

The Accreditation Council for Pharmacy Education (ACPE) standards for the Doctor of Pharmacy (PharmD) programs recommends the implementation of an extensive assessment plan to prepare graduates for practice [4]. A plan should include the use of standardized, systematic, reliable, and valid assessment. It also requires both knowledge and performance evaluation and measurement of the achieved professional competencies.

The quality of assessments is usually expressed in terms of their validity and reliability [5]. However, the quality of tests may be also inferred, at least partially, from the analysis of test items [6].

Consequently, it is essential to analyze and evaluate the assessment items after application. Such analysis and evaluation are needed to improve the items and specify the assessment characteristics; whether the item is performance-oriented, the thinking order the item evaluates, and the item real-life context [7].

Assessment can be performed using different items formats and types; however, assessment items should be developed to address the expected position of a PharmD graduate in healthcare team [8]. The assessment items could be classified according to their format as either case based or noncase based. An item belonging to either format category could be further classified depending on its type/structure as Multiple Choice Questions (MCQ) or an open ended/constructed response question [9, 10].

Case based evaluation items format have the distinct advantage over noncase based format as it can simulate realistic decision making scenarios and allow student to attempt to solve problems and find alternative plans using individualized detailed information [1, 3, 11].MCQs are a popular assessment item type where the examinee has to choose the correct answer to the "stem" question from multiple possible answers. Properly constructed MCQs allow the examiner to serve and cover a variety of learning objectives [3, 9, 12]. On the other hand, an evaluation that depends on answering open ended/ constructed response items, allows the exploration of various alternatives rather than concentrating on one correct answer, and encasement of higher thinking orders [3].

National Pharmacy Licensure Examination (NAPLEX) is a health profession examination that measures a candidate’s knowledge of the practice of pharmacy [1]. NAPLEX utilizes assessment items of different formats and types, including constructed response/open ended, and MCQs (A-type, K-type, true-false items, etc.) [13] with the case based type as the most prevalent item format [3].

While the quality of assessment items of different formats and types (case based learning, MCQs, and open-ended items) has been addressed by many authors [3, 11, 10, 14-18], there are only two studies [3, 19] that attempted to compare the quality of case based and noncase based assessment items.

This study evaluated the quality of test items of sixth year clinical clerkships examinations. The examinations were developed, based on revised Bloom’s levels, by a panel of teaching and assessment experts with not less than 5 years of experience in each specific clinical clerkship. The quality of the assessment items as a function of item format (case based versus noncase based) and assessment item type/structure (MCQs and open-ended/essay items) was investigated as well as the effect of the number of response choices on the characteristics of MCQs items.

Data Collection

Assessment items used in paper-based examinations of six clinical clerkship rotations (Cardiology, Critical Care, Respiratory, Endocrinology, Oncology, and Nephrology) of the sixth/senior year PharmD program offered by the School of Pharmacy at the University of Jordan, (SP-UJ) were collected. All examinations were final examinations that were 60 to 75 minutes in length. The examinations were offered in the first and second semesters of the academic year 2015-2016. Each examination was constructed and reviewed by a panel composed of an academic adviser/rotation coordinator, an academic staff member, and at least two preceptors.

A total of 173 assessment items were included in this study. No item was excluded. The student names and University ID numbers were covered to maintain confidentiality.

Assessment items were mapped to Bloom’s educational learning objectives levels. Each item covered one of the Bloom’s levels (Remembering and understanding skills, analysis skills, Application skills and evaluation and creation skills) [20].

Each item was reviewed and categorized as either case based or noncase based by the authors. Case based items were those that were asked in a scenario-based format (i.e. patient profiles with accompanying test questions) so that in order to properly analyze and answer, a student must refer to the information provided in the patient profile [3]. While noncase based ‘‘stand-alone questions’’ had answers that could be drawn solely from the information provided in the question stem [3]. Items were further classified according to their type as MCQs or open-ended (essay) item. MCQs were further classified according to the number of answer options.

The examinations were characterized in terms of their reliability using Cronbach's Alpha [21] and individual items were characterized in terms of their level of difficulty, discrimination index, point biserial, and number of options (for MCQs) [9, 11, 12, 14, 16, 18, 22, 23]. Individual items or sub-item grades were entered to SPSS (IBM, Armonk, NY) and psychometric parameters were estimated. Item performance/psychometric characteristics calculated included Difficulty Index (difficulty), Discriminating Index (discrimination), and point biserial. The values of these psychometric parameters were used to classify the item quality (Table 1) [6, 12, 15, 16, 23]. Difficulty was calculated as the percentage of the correct responses (MCQs) or the average grade of the specific item relative to the total mark assigned for the item (open-ended). The desired value for difficulty ranges from 20%-30% at the lower limit, to 75%-80% at the higher limit [12]. Discrimination represents the difference between the average grade of the students in the upper quartile (students with highest totals) relative to the item total grade and the average item grade of the students in the lower quartile (students with lowest totals) relative to the item total grade divided by the number of students in a quartile. point biserial is also a measure of the item discriminative power, this indicator is a comparison of performance on an item relative to whole test performance [6, 12, 22, 24, 25]. Point biserial was estimated using SPSS reliability output [6, 12, 25]. Discrimination and point biserial values can range from -1 to 1. High values of discrimination and point biserial indicate that an item was correctly answered by high-performing students, and/or incorrectly answered by low-performing students. On the other side, low or negative indices reveal that an item was incorrectly answered by high-performing students, and/or correctly answered by low-performing students; suggesting a poor or flawed item, or poor ability to differentiate between students.

Statistical Analysis

The differences in means of item performance characteristics as a function of the items format and type were assessed using Mann-Whitney test and logistic regression. A one – way multivariate analysis of variance (MANOVA) was used to study the effect of the four Bloom’s levels on the difficulty and discrimination indices. This was followed up by the post-hoc Bonferroni correction test and analysis of variance (ANOVA) on the dependent variables for pairwise comparisons with p-value < .05 indicating statistical significance. All data analysis was performed using SPSS^®23.0 (IBM, Armonk, NY).

A total of a hundred and seventy-three items, each answered by 72-83 students were evaluated. These items were collected from 6 different final examinations of clinical clerkships during the senior year of PharmD program offered by the SP-UJ. The reliability of each of the studied examinations as measured using by Cronbach’s Alpha which ranged between .62-.80. The reliability of the studied items differed according to item type; MCQs had an average Cronbach’s Alpha of .61, while for open ended items the average Cronbach’s Alpha was .27. A significant positive Pearson’s correlation was observed between the reliability of each examination measured as the difference between Cronbach’s Alpha value of the examination and Cronbach’s Alpha value if a specific item was deleted and item psychometric parameters; difficulty (r=.16, p<.05), discrimination (r=.63, p<.001), and point biserial (r=.85, p<.001).

Table 1 shows the characteristics of the studied assessment items. Over three-quarters of the items studied (77.5%) were case based. More than half of the items were of open-ended structure that measured students’ remembering and understanding skills. The psychometric parameters of the sample items analysis showed that 54% of the questions had excellent difficulty index (difficulty range 20% - 50%) [10, 12, 15], almost one third had excellent DI, and around 60% of the questions were on the higher end of point biserial range, while 8.1% of the items had point biserial values below the recommended levels.

Table 2 shows the mean and standard deviation values of difficulty, discrimination, and point biserial for the questions addressing the four Bloom’s levels: remembering and understanding skills, analysis skills, application skills and evaluation and creation skills) as well as MANOVA and follow-up tests results.

Upon applying one-way MANOVA analysis, significant differences (p < 0.001) were found in item characteristics (difficulty, discrimination, and point biserial) as a function of the measured Bloom’s level.

As follow up tests to MANOVA, we performed analysis of variance on the dependent variables (difficulty, discrimination, and point biserial). Using Bonferroni method, the univariate ANOVA on the difficulty was significant. Similar significant differences were found from ANOVA performed on discrimination, and on point biserial.

The post-hoc analysis to ANOVA for difficulty, discrimination, and point biserial included performing pairwise comparisons. Difficulty of remembering and understanding level items was significantly higher (p=.03) while difficulty of analysis level was significantly lower than those for the other levels. However, discrimination and point biserial of remembering skills and analysis skills were significantly higher than the same metric for application and evaluation and creation levels.

Table 3 represents the analysis of item performance characteristics as a function of different item properties. Case based items were not different in their difficulty (p=.52) and point biserial (p=.47), but they showed significantly lower discrimination (p=.037) in comparison to noncase based items. Open-ended type items demonstrated significantly higher difficulty (p=.008), discrimination (p=.000), and point biserial (p=.000) relative to MCQs. On the other hand, 4-option MCQs showed significantly lower difficulty (p=.000) and point biserial (p=.03), but they were not different with respect to discrimination (p=.81); suggesting similar discrimination power.

When items were compared based on item type/structure; case based item that are open ended type showed significantly higher difficulty (p=.001), discrimination (p=.000), and point biserial (p=.000) when compared to case based item that are of the MCQs type. While noncase based items, which are open ended type had significantly lower difficulty (p=.02), and higher discrimination (p=.001) and point biserial (p=.001). The number of choices that a case based item possessed a significant impact on the difficulty (p=.001), and no effect on discrimination (.972) and point biserial (.163). However, the number of choices showed no effect on noncase based questions. Open ended items when formatted as case based had significantly higher difficulty (p=.001), but lower discrimination (p=.001). While MCQs comparison based on item format showed higher discrimination measured by discrimination (p=.001) and point biserial (p=.001).

MCQ items with four answer options showed significant differences when categorized as case based and noncase based items, this showed as higher difficulty (p=.00) and lower discrimination and point biserial (p=.00). This was not the case dealing with 5 options items as these items demonstrated no effect of item format on difficulty and discrimination, but significant slightly higher point biserial (p=.00) of noncase based options.

The presence of significant interactions between the type of item being open ended or an MCQ item, the item format as case or noncase based, the number of choices in MCQ items is confirmed upon performing regression analysis (Table 4.). Regression analysis showed that the Difficulty of an item is significantly affected by the type of item being open ended or MCQ item, the item format as case or noncase based, the number of choices in MCQ items, and the linear interactions between them. On the other hand, discriminating index was only affected by the item format as case or noncase based while the point biserial of an item is affected by is significantly affected by the type of item being open ended or MCQ item, the item format as case or noncase based, the number of choices in MCQ items, but not the interaction between these factors.

The present study addressed the quality of assessment items in sixth year PharmD clinical clerkships examinations. The study provided three interesting and valuable outcomes that can be of benefit to academic staff and preceptors. (1) The reliability of an examination correlated significantly with items psychometric parameters, (2) the Bloom’s levels associated with an item significantly affected its psychometric properties, and the (3) structure of an item and the number of options possessed by an item significantly affected the psychometric parameters of the item.

The predominant item format in the current study was case based, in which the basic level competencies; remembering and understanding skills, constituted the majority; around two-thirds, of the measured skills. These competencies are the foundation for the higher competencies levels (e.g. analysis, application and evaluation and creation skills). The use of case based items in the assessment of students in a health care professional program, such as a Pharm D program, is necessary. A case based item acts to introduce students to clinical scenarios that simulate patient situation, and enables them to practice decision making during realistic challenges.

Building case based items is a time consuming task and requires a knowledgeable and practice expert examiner [3, 7]. The psychometric parameters of the studied assessment items in our study showed their high quality with less than 8% classified as poor or flawed items [16, 23]. The benefits implied by the use of case based items and items psychometrics parameters, in addition to the high values of Cronbach’s Alpha of each examination evidenced the high reliability of exams under study [21].

Evaluation of the effect of competency levels on the difficulty of an item showed that items addressing analysis skills are more difficult; on the other end of the scale are knowledge and understanding skills which were much easier. These findings are in agreement with the findings of Kim and colleagues (2012), where they found that analysis and synthesis items are more difficult [24].

The evaluation on discrimination measures (discrimination and point biserial) of assessment items addressing remembering and understanding skills and analysis skills are more efficient in differentiating between students in upper and lower grade quartiles.

Analysis on difficulty, discrimination, and point biserial of item formats demonstrated no differences between case based and noncase based items in terms of difficulty and point biserial, and with a slightly better discrimination of noncase based items. These results are similar to that of Phipps and Brackbills (2009) findings [3], demonstrating comparable capability of these two item formats.

The type of an item has significant effect on its psychometric characteristics. Open ended type was easier, yet more discriminative; this tallies well with Thawabieh (2016) findings [19]. It is understood that the nature of open ended items allows for the incorporation of more details when answered by students, while utilizing higher thinking orders allows for better discrimination between high- and low-performance students. On the other hand, the options in MCQs may provide a hint to students on the item-writer intention [24].

The number of options an item possessed showed significant impact on both difficulty and discrimination levels measured as difficulty and point biserial, respectively. The higher the number of options the easier the item and more discriminative. This is in partial agreement with Phipps and Brackbills (2009) findings where they found that 5-options are more difficult and more discriminative. Despite that, they concluded that due to the very small differences between these two groups, it is explainable/justifiable to use a mix of 4 and 5 responses MCQs in exams [3].

Analyzing case based items and noncase based items separately revealed different behaviors. Case based items that are open ended are significantly easier and more discriminative than MCQs, while the same type of noncase based items is more difficult and more discriminative. This can be attributed to the fact that case based items provide scenarios that may simplify the item and guide the examinee but still need to be seen in context.

The number of answer options (4 or 5), had no effect in discrimination metrics of either case based or noncase based assessment items, and it only affected the difficulty of case based items, as 4-option questions were more difficult. The idea of writing more plausible and effective options other than the key answer when an item is based on a case that’s full of details is clearly more challenging and difficult.

Open ended items that are noncase based are more difficult and more discriminative in comparison with open ended that are case based. In addition, MCQs that are noncase based have larger discrimination and point biserial; showing that noncase based items are more discriminative. Again, case based items were shown to have similar, if not inferior, behavior to noncase items, limiting their benefit to their ability to address intended learning and course aims, but expressing no unique performance assessment characteristics.

One more result of the current study was the different effect of the item format on the characteristics of 4- and 5-option MCQs. Noncase based, 4-option MCQs items were significantly easier and more discriminative than case based 4-option MCQs. However, case based and noncase based 5-option MCQs items had no differences in difficulty and DI, and differed slightly in point biserial while remaining within the same recommended point biserial value range.

The previous results showed differences between the two MCQs groups yet cannot be conclusive, as it once again a very challenging time-consuming task not only to construct a case item but also to construct strong, reliable, and efficient choices during the creation of MCQs regardless the item format; being based on case or not.

In a study conducted by Sheaffer and Addo (2013), where they measure both second year Pharm D students’ performance and confidence in answering selected-response and constructed-response items, it was concluded that students performed better and felt more confident in answering selected-response items. Moreover, the incorporation of constructed-response teaching and testing method in pharmacy learning and education was recommended [13].

One important limitation of the present study is the unequal number of items per groups which may have affected the analysis. On the other hand, items classification based on the Bloom’s levels might be subjective [8]; for this reason, peer reviewers represented by both, preceptors as the personnel in direct connection with “real life” cases, and Academic staff/educators to minimize the controversy. Another issue worth mentioning is that our analysis is based on the Classical Test Theory; an alternative approach to evaluate the properties of items is the Item Response Theory which is based on the study of test and item scores based on assumptions concerning the mathematical relationships between abilities and item responses [26].

Reliable and effective assessment of students in health care professional programs where decisions related to patients’ treatment are to be made is crucial. PharmD students should be trained to deal with real medical cases during their study course especially senior year. Psychometrics parameters are efficient in evaluating clerkships examinations items. The study showed that the psychometric properties of items is dependent on the associated Bloom’s levels. Item formats, structure, and number of options in MCQs, as well as the different combinations of these factors affected the psychometric properties of items and the value of Cronbach’s alpha. The necessity to build examination that are able to measure student learning and contribute to programs development is daunting. It is critical to develop training programs for educators on how to construct "good" items and examinations.

ACPE; Accreditation Council for Pharmacy Education, PharmD; Doctor of Pharmacy, MCQ; Multiple Choice Questions, NAPLEX; National Pharmacy Licensure Examination, SP-UJ; Pharmacy at the University of Jordan, difficulty; Difficulty Index, discrimination; Discriminating Index, MANOVA; one-way analysis of variance, ANOVA; analyses of variance.

Ethics approval and consent to participate

This study was approved by the SP-UJ Scientific Research Committee (IRB: 7/2017).

Consent for publications

Not applicable

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing of interest

The authors declare that they have no competing interests.

Funding

Not applicable

Authors’ contributions

HA participated in designing the study and participated in the writing of the manuscript. GB revision of data analysis and participated in the writing of the manuscript. AK revised and modified the tested examination and the written manuscript. SA participated in designing the study, revised the data collection entry, carried out the data analysis, and participated in the writing of the manuscript. All authors have read and approved the manuscript. In addition, all authors are aware of this submission and agree with it.

Acknowledgement

The authors would like to acknowledge the faculty members and preceptors at the Department of Biopharmaceutics and Clinical Pharmacy at the SP-UJ for their cooperation. Also, the authors would like to acknowledge Miss. Sara Jalouqa for her help in data entry.

Tofade T, Elsner J, Haines ST. Best Practice Strategies for Effective Use of Questions as a Teaching Tool, Am J Pharma Educ. 2013; 77(7): 1-9 Article 155.
OECD (2013), Synergies for Better Learning: An International Perspective on Evaluation and Assessment, OECD Reviews of Evaluation and Assessment in Education, Paris. DOI: https://doi.org/10.1787/9789264190658-7-en)
Phipps SD, Brackbill ML. Relationship between Assessment Item Format and Item Performance Characteristics, Am J Pharma Educ. 2009; 73(8) : 1-6 Article 146.
Accreditation Counsil for Pharmacy Education, 'Accreditation Standards and Key Elements for the Professional Program in Pharmacy Leading T the Doctor of Pharmacy Degree "Standards 2016", 2015.
Sullivan GM. A primer on the validity of assessment instruments. J Grad Med Educ. 2011; 3:119-120
Varma S. Preliminary Item Statistics Using Point-Biserial Correlation and P-Value. 2006. Educational Data Systems. Inc. eddata.com Accessed 13 September 2018.
Garavalia LS, Marken PA, Sommi RW. Selecting Appropriate Assessment Methods: Asking the Right Questions. Am J Pharma educ. 2002; 66: 108-12.
Wallman A, Lindblad AW, Hall S, Lundmark A, Ring L. A Categorization Scheme for Assessing Pharmacy Students’ Levels of Reflections during Internship. Am J Pharma Educ. 2008; 72(1): 1-10 Article 05.
Caldwell DJ, Pate AN. Effects of Question Formats on Student and Item Performance. Am J Pharma Educ. 2013; 77(4): 1-5 Article 71.
Palmer EJ, Devitt PG. Assessment of Higher Order Cognitive Skills in Undergraduate Education: Modified Essay or Multiple Choice Questions? Research Paper. BMC Med Educ. 2007; 7(49):1-7.
Medina MS. Relationship between Case Question Prompt Format and the Quality of Responses. Am J Pharma Educ. 2010; 74(2): 1-7 Article 29.
Al Muhaissen SA, Ratka A, Akour A, Alkhatib HS. Currents in Pharmacy Teaching and Learning, https://doi.org/10.1016/j.cptl.2018.12.006
Sheaffer EA, Addo RA. Pharmacy Student Performance on Constructed-Response Versus Selected-Response Calculations Questions. Am J Pharma Educ.2013; 77(1), 1-7 Article 6.
Chauhan PR, Ratrhod SP, Chauhan BR, Chauhan GR, Adhvaryu A, Chauhan AP. Study of Difficulty Level and Discriminating Index of Stem Type Multiple Choice Questions of Anatomy in Rajkot. B I O M I R R O R. 2013; 4(6): 37-40.
Sabri S. Item Analysis of Student Comprehensive Test for Research in Teaching Beginner String Ensemble Using Model Based Teaching among Music Students in Public Universities. Int J Educ Res. 2013; 1(12):1-14.
Siri A, Freddano M. The Use of Item Analysis for the Improvement of Objective Examinations. Procedia - Social and Behavioral Sciences. 2011; 29:188-97.
Tarrant M, Ware J, Mohammed AM. An Assessment of Functioning and Non-Functioning Distractors in Multiple-Choice Questions: A Descriptive Analysis. BMC Med Educ. 2009; 9(40):1-8.
Trevisan MS, Sax G, Michael WB. The Effects of the Number of Options Per Item and Student Ability on Test Validity and Reliability. Educational and Psychological Measurement. 1991; 51(4):829-37.
Thawabieh AM. A Comparison between Two Test Item Formats: Multiple-Choice Items and Completion Items. British J Educ. 2016; 4(8): 63-74.
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing, Abridged Edition, pages 66-67.
Norcini JJ, Swanson DB, Grosso LJ, Webster GD. Reliability, Validity and Efficiency of Multiple Choice Question and Patient Management Problem Item Formats in Assessment of Clinical Competence. Med 1985; 19: 238-47.
Brown JD. Point-Biserial Correlation Coefficients. JLT Testing & Evaluation SIG 2001; 5(3): 13-17.
Sim S, Rasiah RI. Relationship between Item Difficulty and Discrimination Indices in True/False-Type Multiple Choice Questions of a Para-Clinical Multidisciplinary Paper. Annals Academy of Medicine 2006; 35: 67-71.
Kim MK, Patel RA, Uchizono JA, Beck L. Incorporation of Bloom’s Taxonomy into Multiple-Choice Examination Questions for a Pharmacotherapeutics Course. Am J Pharma Educ. 2012; 76(6): 1-8 Article 114.
IBM corporation SPSS, 'Using SPSS for Item Analysis'. SPSS Inc. 1998.
Baker, Frank B.; 2001; The Basics of Item Response Theory; ERIC Clearing House on Assessment and Evaluation.

Table 1. Descriptive Statistics for Evaluated Assessment Items (N=173)
Variable	No. (%)
Format
Case based item	134 (77.5)
Non-case based item	39 (22.5)
Type/Structure
Open-ended/essay item	98 (56.6)
Multiple choice item	75 (43.4)
Number of Choices*
2 (True/False)	11 (14.7)
3	3 (4)
4	33 (44)
5	28 (37.3)
Bloom’s level
Remembering and Understanding Skills	60 (34.7)
Analysis skills	57 (32.9)
Application Skills	28 (16.2)
Evaluation and Creation Skills	28 (16.2)
Difficulty Index (DIF I) Levels ¹
Difficult (DIF I<20%)	6 (3.5)
Acceptable/Good (20%≤DIF I<50%)	29 (16.7)
Excellent (50≤DIF I<80%)	93 (53.8)
Easy/Poor (DIF I≥80%)	45 (26)
Discriminating Index (DI) Levels ¹
Poor/Flawed (DI<0)	5 (2.9)
Poor (0≤DI<.2)	36 (20.8)
Acceptable (.2≤DI<.3)	40 (23.1)
Good (.3≤DI<.4)	39 (22.5)
Excellent (DI≥.4)	53 (30.6)
Point-Biserial (PBS) Levels ²
Poor/Flawed (PBS<0)	14 (8.1)
Poor (0≤PBS<.15)	28 (16.2)
Recommended (.15≤PBS<.25)	28 (16.2)
Good (PBS≥.25)	103 (59.5)
* Percent relative to MCQs count.

Table 2. Item Performance Characteristics Based on Item measured ILO Level
Performance Characteristics	Remembering and Understanding Skills Mean (SD)	Analysis Skills Mean (SD)	Application Skills Mean (SD)	Evaluation and Creation Skills Mean (SD)
Level of Difficulty	66.8 (18.9)^a	56.4 (22.1)^a	63.6 (17.1)	63.0 (22.3)
Discriminating Index	.41 (.19)^b	.40 (.19)^b	.22 (.09)	.25 (.11)
Point-biserial	.39 (.12)^b	.38 (.18)^b	.21 (.10)	.23 (.12)
ILO: Intended Learning Outcome ^a significant difference between pairwise comparisons over the other three types of skills.
^b significant difference between pairwise comparisons over application skills and evaluation and creation skills..

Table 3. Item Performance Characteristics Based on Different Item Properties and Layers (N=173).
	Level of Difficulty Mean (SD)	Discriminating Index Mean (SD)	Point-biserial Mean (SD)
Item Format
Case Based	64.4 (20.7)	.33 (.21)	0.35 (.20)
Noncase Based	62.9 (20.1)^NS	.45 (.21)*	0.23 (.12)^NS
Item Type/Structure
Open-ended/Essay	66.5 (18.3)	.39 (.23)	.37 (.13)
Multiple Choice	60.8 (22.8)*	.26 (.22)*	.24 (.11)*
Number of Choices/options
4-option Items	50 (20.1)	.39 (.21)	.17 (.1)
5-option Items	65.3 (23.3)*	.35 (.24)^NS	.29 (.1)*
Item Format: Item Type/Structure
Case Based: Open-ended	67.5 (18.5)	.38 (.11)	.37 (.13)
Case Based: MCQs	58.2 (23.3)*	.25 (.21)*	.14 (.12)*
Noncase Based: Item Type/Structure
Noncase Based: Open-ended	56.2 (11.4)	.52 (.23)	.41 (.11)
Noncase Based: MCQs	64.8 (21.7)*	.36 (.21)*	.23 (.11)*
Case Based: Item Type/Structure
Case Based: 4-option	45.8 (20.9)	.31 (.20)	.19 (.11)
Case Based: 5-option	69.1 (20.8)*	.29 (.20)^NS	.18 (.11)^NS
Noncase Based: Number of Choices/options
Noncase Based: 4-option	59 (14.8)	.35 (.20)	.24 (.11)
Noncase Based: 5-option	61.1 (25.6)^NS	.39 (.20)^NS	.29 (.12)^NS
Open-ended: Item Format
Open-ended: Case Based	67.5 (18.5)	.36 (.21)	.38 (.12)
Open-ended: Noncase Based	56.2 (11.4)*	.57 (.22)*	.42 (.14)^NS
MCQ: Item Format
MCQ: Case Based	58.2 (23.3)	.23 (.20)	.14 (.180)
MCQ: Noncase Based	64.8 (21.7)^NS	.35 (.21)*	.26 (.11)*
4-option: Item Format
4-option: Case Based	45.8 (20.9)	.27 (.19)	.15 (.09)
4-option: Noncase Based	59 (14.8)*	.38 (.20)*	.24 (.11)*
5-option: Item Format
5-option: Case Based	69.1 (20.8)	.28 (.17)	.13 (.09)
5-option: Noncase Based	61.1 (25.5)^NS	.33 (.19)^NS	.27 (.11)*
NS: Not Significant * Significant at measurement level

Table 4: Regression Analysis of Item Performance as a function of Item Characteristics
Predictor	Difficulty Index		Discrimination Index		Point biserial
Predictor	Chi-square	p-value	Chi-square	p-value	Chi-square	p-value
Model	54.609	0.000	42.224	0.000	141.042	0.000
Open-ended (essay)/MCQs items	16.130	0.001	3.727	0.444	22.025	0.000
Case/Noncase based items	19.903	0.000	19.474	0.001	40.597	0.000
Number of Choices	9.600	0.022	2.474	0.649	9.636	0.022
Open-ended (essay)/MCQs items* Case/Noncase based items	18.884	0.000	-^$	-^$	-^$	-^$
Case/Noncase based items* Number of Choices	12.841	0.005	-^$	-^$	-^$	-^$
^$ were excluded from the model as using the stepwise procedure, no effects can be added or removed from the initial model.

Download PDF

Journal Publication

published 12 Jun, 2020

Read the published version in BMC Medical Education →

Editorial decision: Major revision
07 Apr, 2020
Review #1 received at journal
31 Mar, 2020
Reviewer #1 agreed at journal
16 Mar, 2020
Reviewers invited by journal
13 Mar, 2020
Editor assigned by journal
20 Feb, 2020
Submission checks completed at journal
19 Feb, 2020
Editor invited by journal
19 Feb, 2020

You are reading this older preprint version

Read the latest preprint version →

Evaluation of the Effect of Items’ Format and Type on Psychometric Properties of Sixth Year Pharmacy Students Clinical Clerkship Assessment Items

Status:

Journal Publication

Version 2

Abstract

Introduction

Methods

Results

Discussion

Conclusion

List Of Abbreviations

Declarations

References

Tables

Status:

Journal Publication

Version 2