Item Analysis: The impact of distractor efficiency on the discrimination power of multiple choice items

Background Distractor efficiency of multiple choice item responses is a component of item analysis used by the examiners to to evaluate the credibility and functionality of the distractors.Objective To evaluate the impact of functionality (efficiency) of the distractors on difficulty and discrimination indices.Methods A cross-sectional study in which standard item analysis of an 80-item test consisted of A type MCQs was performed. Correlation and significance of variance among Difficulty index (DIF), discrimination index (DI), and distractor Efficiency (DE) were measured. Results There is a significant moderate positive correlation between difficulty index and distractor efficiency, which means there is a tendency for high difficulty index go with high distractor efficiency (and vice versa). A weak positive correlation between distractor efficiency and discrimination index. Conclusions Non-functional distractor can reduce discrimination power of multiple choice questions. More training and effort for construction of plausible options of MCQ items is essential for the validity and reliability of the tests.


Introduction
Well-constructed multiple-choice questions (MCQs) are appropriate tools for the assessment of the cognitive learning domain. It can test a wide range of knowledge, including; recalling, comprehension, and problem-solving, with high objectivity and accurate interpretation of content validity (1).
The effectiveness of MCQs in assessing the learning of the students can be measured both by pre-validation and post-validation methods. Item analysis is a statistical process which is used as a post-validation method for measuring the effectiveness of MCQs regarding their validity and reliability (2). Item analysis is a mathematical analysis of students' responses on an exam (test) to evaluate the quality items and consequently improving the assessment (3). The main advantage of item analysis is the ability to increase the effectiveness of the exam. The effectiveness of the exam is improved either by refining the defected items or deletion of poorly constructed ones from the questions bank (3,4).
Item analysis includes three components; difficulty index (DIF), discrimination index (DI), and distractor Efficiency (DE) (5,6). The distractor Efficiency (DE) is a component of item analysis that allows the assessor to evaluate the credibility of incorrect options (distractors) of MCQ item. The distractor is considered functioning (FD) if selected by not less than 5% of the group of the examinee (7). Distractors efficiency (DE) is calculated according to the number of non-functional distractors per item (NFD) (8).
This study was conducted to evaluate the impact of non-functional distractors (NFD) on the difficulty and discrimination indices of the test items.

Materials And Methods
Study design This is a cross-sectional, analytic study. It was conducted at the University of Bisha, College of Medicine (UBCOM) in the duration from April to June 2018. UBCOM adopts a three-phase, integrated approach to undergraduate medical curriculum.

Material
One of the phase I modules exams, the Principles of Human Diseases was selected for the study. This is an integrated, multi-disciplinary module which is conducted during the second semester of year two (number of the students was 45).
The test was consisted of 59 items of multiple choice questions (type A MCQs). Each item is formed of a stem followed by four options, a single best answer, and three distractors.
No penalty for the blank or wrong answers.

The exam blueprint was developed by the course instructors and reviewed by A standing
Students Assessment Committee (9). Following the exam, standard item analysis was obtained (Apperson, Data Link 1200) and processed for the study.

Ethical consideration
The study was approved by UBCOM research and ethics committee.

Statistical analyses
The data obtained from the standard item analysis were analyzed by using SPSS V20 (Armonk, NY: IBM Corp, USA). Descriptive statistics and Pearson correlation coefficient were applied to measure the significance of difference and correlation among different variables. Level of significance was fixed at 95%, and any P < 0.05 was considered to be significant.

Item analysis:
The total number of items analyzed was 59. The average score of the class was 55.

Discussion
The number of exam items was adjusted according to the course blueprint and the tested domains.
The KR-20 examination was 0.906, ideal, and showing high reliability of the standard examination (12,13). Values such as 0.8 and higher are the aims of medical education.
This finding is in agreement with the work of Kehoe (12)(13)(14). He reported that for short tests (10-15 items) values of as low as 0.5 are reasonable, but those tests with more than 50 items should yield values of 0.8 or higher. Low values of KR-20 were linked to many easy or difficult questions, poorly written nondiscriminating items, non-homogeneity of educational contents, and the discrepancy between the assessment level and the educational task (14,15).
The both of majority of exam items (72.9%) and the average exam difficulty (69.4±21.86) were within the acceptable difficulty index range. Moreover, 69.5% of the items were categorized as excellent discriminating, and the average exam discrimination index was 0.3 (±0.16).
The type of correlation between DE and DIF indicates that items with less non-functional distractors have high difficulty index (easy items) and vice versa. This finding in agreement with recent works of Hingorjo et al., Burud et al. and Kheyami et al. (8,16,17).
The decreased number of non-functional distractor increases the difficulty index of items (become easier). Also, they reported that the DE reduces the DIS of items. In the current study, DE has a weak positive correlation with the DIS (P=0.047437, is significant at p < .05). NFDs can affect the discrimination power of the item (8) and should be replaced by more plausible distractors (11) or the item removed from the test (18). Such items have high DIF (8) as all students well got them right 23 (11)or become distracting and causing a false assessment (10).
NFDs were linked to minimal training of items writing and distractors selection (7,19,20).
It is clear that DE has an impact on both DIF and DIS individually, but whether it can affect both of them at the same time, need more research work.
Items with nonfunctional distractors can be present in any exam or test; the second step after defining them in the running exam remains open. In such items, the nonfunctional distractors can be changed with more plausible ones or deletion of the question from the bank. The area of debate is the status of these items in the current exam or test. In the current study deletion of items with two or three non-functional distractors increased the average difficulty index of the exam to from 36.83 to 42.82 and DE showed non-significant correlation with DIF of items (r= 0.2296, p= 0.133806). Deletion of such items from exam or test can affect students results and raises ethical debate. Kehoe (1995) reported that deletion of such items is ethical and justifiable (14). He argued that the test aims to determine the rank of each student. Using items or questions with unacceptable psychometrics is against this objective, and the accuracy of the resulting ranking is degraded.
Limitations of this study include the fewer number of students and items and application on one course. The strength of the study, the test is considered valid and reliable.