Using crossed random-effects models to test the effects of None of the above and personality traits in multiple-choice items
Measuring the students’ knowledge in a particular subject is not easy, as sometimes it is needed to summarize months or years of studying in a unique assessment task. For this reason, it is important to write proper exams within the academy, which involves time and effort for lecturers. In this regard, multiple-choice items (MC) are popular nowadays for measuring knowledge, as they allow evaluating a great number of students in an accurate, objective, and quick way that makes lecturers able to sample efficiently large syllabus (Haladyna, et al., 2019). That could be translated into the necessity that the difficulty of the items has a wide range, predominating the items with an intermediate difficulty, and more importantly, with proper psychometric properties to distinguish between the different knowledge of the examinees. That is why the stems and response options must be clear enough and avoid inducing errors because of the writing so that the goal is to assess only the examinees’ knowledge. Additionally, examinees with diverse personality traits have different academic performances (Chamorro-Premuzic, 2003; Furnham et al., 2003; Furnham & Chamorro-Premuzic, 2004; Conard, 2006; Chamorro-Premuzic & Furnham, 2008; Beaujean et al., 2011; De Feyter et al., 2012; Nye et al., 2013; Coenen et al., 2021), and sometimes, these personality traits interact with the item-writing flaws, make the assessment unfair and dependent on the examinees’ personality (Frary, 1991).
The responsible authors of standardized the items’ writing recommendations were Haladyna and Downing (1989a; 1989b). They took a wide set of manuals written by professors and teachers about their own experiences creating items and compiled the information to select the most shared by the authors. They finally collected 31 recommendations (Haladyna et al, 2002). However, these recommendations did not respond to empirical criteria. For this reason, some studies have been made to prove empirically these guidelines and evaluate their appropriateness (e.g., Downing, 2005).
Fourteen out of these 31 initial recommendations were specifically focused on writing alternatives to the correct answer, which are called distractors (Haladyna & Rodríguez, 2013). The importance of writing distractors rests on the complexity of making them plausible, and it requires most of the time when designing an exam. If examinees easily discard the alternatives to the correct answer, the items will suffer from a lack of discrimination, as there is no way to distinguish between people who have high knowledge of the contents of the subject and those who do not. This writing involves considerable effort, which is why teachers sometimes prefer alternatives whose writing costs little, such as All-of-the-above (AOTA) and None-of-the-above (NOTA) since they are easy to generate (Frey et al., 2005). However, the use of these options is often discouraged, even though research on the relevance of the writing recommendations is limited and the empirical evidence collected offers inconclusive results (Tarrant et al., 2009). Therefore, finding a strategy that reduces the cost of writing items without reducing their quality is relevant.
Use of NOTA
Haladyna and Downing (1989b), when making their first review, commented that items with NOTA are more difficult and less discriminatory. Test scores are also less reliable, and criterion validity is negatively affected. They concluded that they were no advantage in this alternative to consider their recommendation. However, years later, they contradicted themselves and changed the recommendation, pointing out that NOTA can be used with caution, since it increases the difficulty, but does not affect discrimination. Even so, they commented that the ideal was the writing of distractors that provide information to the item and focus on the question (Haladyna & Rodriguez, 2013). However, contradictory results are still found.
On one hand, Rich & Johanson (1990) defend that NOTA can be included as a distractor in some items to raise their difficulty and achieve an intermediate difficulty, which increased their discrimination. However, they recommend that only in the cases where better distractors could not be created, as they are a better option than a non-functional distractor, but worse than a functional distractor. They also highly encourage not using NOTA for mathematical items, as they become easier since even with calculation mistakes the student can guess the item by selecting NOTA when this answer is correct. Frary (1991) seems to support the use of NOTA when the items would be so easy otherwise. However, he disagrees with the type of contexts where it can be used, as he defends its use in objective items (as mathematical items) when one of the options is correct or incorrect without having room for interpretation. Furthermore, this author introduces some examinees' complaints about the prevalence of NOTA, pointing at the uncertainty of considering that the correct answer could not be present. Knowles and Welch (1992) conducted a meta-analysis and concluded that there are no differences in discrimination and difficulty between the item with NOTA as an option and those which do not have it. Dochy et al. (2001) analysed the effect of NOTA in problem resolution, and they highlighted that the more qualitative the solution, the greater the percentage of examinees that select NOTA as the correct answer when not. They pointed out that the cause of the mistakes in selecting NOTA was the ambiguity of the problems. Downing (2005) commented that the items that do not follow the redaction recommendations are, in general, more failed than the standard items. Their difficulty raises artificially, and it is not related to the examinees' skills. Martínez et al. (2009) reported an interesting result when they analysed the use of NOTA as the correct answer and as a distractor, and they found evidence that the difficulty increased when NOTA is the correct answer, but there are no differences between the lack of it or its use as a distractor. Boland et al. (2010) pointed to an increase in the difficulty without better discrimination, but they commented that the data are more confusing, and the recommendation less homogeneous than in other recommendations, as in their revision found that some authors defend that the difficulty and discrimination of these items are reasonable, and even point at higher discrimination, reducing the probability of guessing. DiBattista et al. (2014) also manifested differences in the difficulty with NOTA as the correct option, in contrast with the no appearance or its use as a distractor. They commented that, at first sight, the discrimination does not seem affected, but the effect changed when they considered the real knowledge of the examinees. They asked the examinees to write the correct answer when they chose NOTA and rescored based on their answers, and they found significant differences in the discrimination. Therefore, they concluded that choosing NOTA could be due to three different reasons: knowing the correct answer, recognising the lack of it but do not know the answer or guessing. So, they discouraged its use, as it brings several problems: making the items harder, appearing more discriminating than they are, easier to guess and rest educational power to the exam. However, they pointed out that these problems appear only when NOTA is the correct answer. Similarly, Pachai et al. (2015) found that NOTA increases the difficulty when it is the correct answer, and there are no differences when it replaces any distractor. However, there are indeed differences in discrimination when the more functional distractor is replaced. In conclusion, the authors defended that the difficulty increases when NOTA is the correct answer because the performance of the low trait level students equals the high trait level students through the distractors. Table 1 resumes the main results related to NOTA.
Table 1
Results of the NOTA studies
|
Difficulty
|
Discrimination
|
Reliability
|
Criterion validity
|
Haladyna & Downing (1989b)
|
↑
|
↓
|
↓
|
↓
|
(Haladyna & Rodriguez, 2013)
|
↑
|
=
|
/
|
/
|
Rich & Johanson (1990)
|
↑
|
↑
|
/
|
/
|
Frary (1991)
|
↑
|
/
|
/
|
/
|
Knowles & Welch (1992)
|
=
|
=
|
/
|
/
|
Dochy et al. (2001)
|
↑
|
/
|
/
|
/
|
Downing (2005)
|
↑
|
/
|
/
|
/
|
Martínez et al. (2009)
|
↑ (correct answer)
= (distractor)
|
/
|
/
|
/
|
Boland et al. (2010)
|
↑
|
=
|
/
|
/
|
DiBattista et al. (2014)
|
↑ (correct answer)
= (distractor)
|
↓ (correct answer)
= (distractor)
|
/
|
/
|
Pachai et al. (2015)
|
↑ (correct answer)
= (distractor)
|
↓ (replacement of the most functional distractor)
|
/
|
/
|
Note: ↑: increase. ↓: decrease. =: non-significant differences. /: does not take into account.
Additionally, some authors have studied the use of NOTA about the positive effects of the evaluation, and they have found that, when NOTA is the correct answer, the exam does not help the examinees in the knowledge acquisition, as they do not have access to the correct answer while they are taking the exam (Blendermann et al., 2020; Jang et al., 2014; Odegard & Koen, 2007). In this regard, AOTA and NOTA as options seem to devaluate the assessment purposes, as the items can be hit correctly without understanding the syllabus (Boland et al., 2010).
Personality traits
As in the case of NOTA, the research shows contradictory results. Chamorro-Premuzic and Furnham (2003) found a positive relationship between Conscientiousness and examination grades, and a negative relationship between these grades with Extraversion and Neuroticism. These authors also found the same positive relationship between academic performance and Conscientiousness and a negative relationship between Extraversion and academic performance (Furnham et al., 2003). They also found a positive correlation between statistics examination grades and Conscientiousness and a negative correlation with Extraversion (Furnham & Chamorro-Premuzic, 2004). Conard (2006) found a positive relationship between three different measures of academic performance and Conscientiousness, but she reported no relationship with other traits. Again, Chamorro-Premuzic and Furhnam (2008) found that academic performance correlated with Openness to Experience and Conscientiousness, both in a positive direction. Beaujean et al. (2011) examined the relationship between personality and reading and math achievement. They found that Conscientiousness and Openness to Experience influence both reading and math performance, Agreeableness was a predictor of reading achievement and Extraversion affected the performance in math. Moreover, De Feyter et al. (2012) found an effect of four out of the five Big Five traits. Agreeableness, Conscientiousness, and Neuroticism predicted positively academic success, while Extraversion showed a negative relationship. Nye et al. (2013) found an effect on academic success for all the traits except for Conscientiousness. In their case, they found a negative relationship between Extroversion, and a positive effect on Agreeableness, Neuroticism, and Openness to experience. Finally, Coenen et al. (2021) found that Conscientiousness had a positive relationship with academic performance, and Risk Preference, had a negative relation, whereas they did not find significant results for Emotional Stability. Table 2 resumes the information about personality traits studies’ main results.
Table 2
Results of the personality traits studies
Authors
|
Conscientiousness
|
Openness to Experience
|
Agreeableness
|
Extraversion
|
Neuroticism
|
Risk Preference
|
Chamorro-Premuzic & Furnham (2003)
|
+
|
No effect
|
No effect
|
-
|
+
|
/
|
Furnham et al. (2003)
|
+
|
No effect
|
No effect
|
-
|
No effect
|
/
|
Furnham & Chamorro-Premuzic (2004)
|
+
|
No effect
|
No effect
|
-
|
No effect
|
/
|
Conard (2006)
|
+
|
No effect
|
No effect
|
No effect
|
No effect
|
/
|
Chamorro-Premuzic & Furnham (2003)
|
+
|
+
|
No effect
|
No effect
|
No effect
|
/
|
Beaujean et al. (2011): reading performance
|
+
|
+
|
-
|
No effect
|
No effect
|
/
|
Beaujean et al. (2011): math performance
|
+
|
-
|
No effect
|
+
|
No effect
|
/
|
De Feyter et al. (2012)
|
+
|
No effect
|
+
|
-
|
+
|
/
|
Nye et al. (2013)
|
No effect
|
+
|
+
|
-
|
+
|
/
|
Coenen et al. (2021)
|
+
|
/
|
/
|
/
|
No effect
|
-
|
Note: +: positive effect; -: negative effect; /: does not take into account.
Aim and objective
Considering all this information, the main interest of this paper is explaining the use of NOTA and analysing if there are differences when it is used as the correct answer or a distractor, to determine if it negatively affects the quality of multiple-choice items. Three different forms of statistics items were designed to accomplish this task, depending on the appearance or not of NOTA, and its function in the item, resulting in three different exams.
Further, personality traits and academic performance have been related in a great number of papers, but most of the studies only include correlational data in their analyses. More complex models could be more accurate in terms of capturing the effect of personality traits on the probability of getting the correct answer in the exams, which results in a greater or worse academic performance. Conscientiousness, Extraversion, and Neuroticism seem to be related to academic performance in most of the mentioned works but including all the Big-Five personality traits and Impulsivity using finer analyses could lead to finding also effects of the other traits.
Additionally, the relationship between personality and how examinees deal with NOTA options could be relevant. As Frary (1991) points out in his study, personality could be related to some differences to react by this type of item, as only part of the examinees pointed at the increase of uncertainty, with could be related to Neuroticism. It is interesting also to check if other personality traits could be related to the response behaviour when examinees handle NOTA.
Unravelling the real functioning of NOTA and finding out if its use could be acceptable in some circumstances will help the lecturers to save time and effort in the creation of some items and invest it in others, making an exam that better collect the performance of the examinees, which will lead to greater validity and a fairer assessment. In this sense, using more complex models, such as crossed random-effects models, could help capture variance related to items and examinees combined, and the possible interaction between them.
The specific hypotheses tested are:
- The appearance of NOTA could decrease the probability of getting the correct answer. This effect will be more noticeable when NOTA is the correct answer than when NOTA is a distractor.
- Conscientious and open to experience examinees will have more probability of getting the correct answer.
- Neurotic, impulsive, and extroverted examinees will have less probability of getting the correct answer.
- Examinees will have different behaviour when answering items with NOTA.