It is unreasonable to expect any undergraduate curriculum to include full coverage of all conditions within paediatrics. Nor would that approach be beneficial to most of the students (only a minority of whom will go on to practice paediatrics). However, it would seem sensible that a student should be expected to acquire the most basic, common, and relevant knowledge first and foremost. The only means by ensuring that they have obtained this knowledge is by assessing it.
When the test item difficulty for the MAC examination was ranked within each group, a Spearman’s rank correlation indicated a positive correlation. This illustrates that, broadly, all three groups found the same questions relatively difficult/easy. This adds an element of reliability to the MAC examination as not just are the overall scores equivalent between the two-year groups, but individual question performance is comparable as well.
It is generally accepted that for an Multiple Choice Question (MCQ) examination, adequate internal consistency is demonstrated by a Cronbach's alpha of at least 0.7 (8). The low level of Cronbach's alpha seen with the MAC examination may be explained by a poor spread of questions. The examination was limited, for the purpose of improving participation, to only 30 questions and thus it is inevitable that there will be aspects of the curriculum that remain untested. It is known that increasing the number of test items will statistically improve the reliability of MCQ's (9). High stakes MCQ examinations would generally have many more than 30 items. When compared with the RCSI official written paper, for example, they use 120 items, which would vastly improve the alpha of the MAC.
Only two-thirds of students passed the MAC exam, whereas 96% of the same students passed their university paediatric examination (2). Their level of paediatric knowledge reflects the university curriculum, therefore there must be significant gaps between the university curriculum and the ‘hidden’ curriculum as determined by non-academic clinicians.
The Royal College of Paediatrics and Child Health (RCPCH) theory examination faculty find that an MCQ item which scores at least 0.2 on the discrimination index is deemed acceptable and therefore a re-usable question. Questions, which score less than 0.2, are not necessarily discarded, rather they go through a review process at one of the regular board meetings where the theory examiners revise the question to improve its future psychometric value. Interestingly, for the original intention of the MAC examination, one in which we hope the candidate proves to have a basic level of knowledge, this measure would not have been as appropriate or useful. The original intention was to design a paper in which the spread of marks would have been less and more bunched around a relatively high gross score, therein, preventing many of the questions displaying good discriminating ability as most candidates would be getting all of them correct. As it turned out, the paper was found to be relatively difficult with a low overall gross mean score and a wide spread of marks for many of the items which show good ability to discriminate candidates.
For a well-prepared group of examinees, item difficulty indices may range from 70-100% (10). Hence, the passing score is usually higher when the examinee group is more able (11). When the difficulty index moves towards high or low from 50%, the discriminating index becomes low. For example, if all the candidates get the question correct or incorrect then there is no way of using that question to discriminate between the best and worst candidates. However, if we omit questions from a potential question bank because the candidates have found it too easy or too hard then, in a sense, the candidates are setting the standard for the papers themselves. We should allow medical professionals to decide what the standard should be (as is the case with our study in using non-faculty clinicians to decide the questions) and then use a rigorous standard setting process such as Angoff to standard set the paper. If it just so happens that the questions are very easy, then this will result in a high passing score and successful candidates must correctly answer a higher-than-normal proportion of questions to pass the exam (as was envisaged for the MAC). Similarly, if the questions happen to be very difficult then this will be corrected for by good standard setting. In that scenario the student need only get a few questions correct to pass.
From the above analysis of the most difficult questions, we can see a pattern emerging. Questions on respiratory paediatrics appear to have been answered poorly. Whilst respiratory conditions are the most common of paediatric presentations, there may be an argument that aspects of the teaching approach at the time did not accurately reflect the real clinical practice for these conditions. The students also seem to regularly have difficulty with questions regarding seizures in childhood despite there being a strong emphasis on seizure disorders and febrile convulsion in the curriculum. The students may not have grasped the subtle but clinically crucial differences between types of seizure presentation. There are multiple examples above where, despite their inclusion in the curriculum, the students appear to have failed to grasp the true clinical relevance of the condition or presentation. Interestingly, for the questions on which the students performed better than the doctors, the questions were quite specific, and the answer can be found directly in the curriculum. This reflects how students learn with limited clinical experience and is a good example of assessment being a key driver of learning.
The concept of a high stake’s assessment with a relatively high passing score breaks from tradition. However, we feel that this model would potentially work very well to ensure that all licensed practitioners have a firm grasp of a finite amount of basic knowledge.
Future research in this area is warranted. Future MAC examinations should have a larger number of test items. This will improve the psychometrics of the examination and will increase coverage of the curriculum. Results from these future studies could help with curriculum development by highlighting specific areas of educational need.