Thirty-one consultant level doctors participated in interviews, all of whom had recently been examiners at final medical examinations. Experienced examiners were classed as those examiners with over six years experience of examining at final medical exam level. Some examiners had over 20 years examining experience. A broad range of specialties was represented, with the majority being consultants (Attendings) in Medicine or Surgery and a smaller number from Psychiatry, Paediatrics, Academic Medicine, Obstetrics and Anaesthetics. The majority of experienced examiners had examined in varied institutions at undergraduate and postgraduate level. Demographic data is shown in Table 1
Table 1
Centre
|
Gender
|
Level of experience
|
England
|
4 Male, 6 Female
|
7 experienced, 3 less experienced
|
Ireland
|
9 Male, 1 Female
|
9 experienced, 1 less experienced
|
Scotland
|
6 Male, 5 Female
|
6 experienced, 5 less experienced
|
Three main themes were identified: ‘OSCEs are inauthentic’, ‘Looking for Glimpses of truth’ and ‘Evolution with experience’. Representative quotes are identified by interview centre (01, 02 or 03) and the order in which participants were interviewed.
Theme 1 OSCEs are Inauthentic
While it is well known that OSCEs cannot be truly reflective of the real world or a clinical environment, it quickly became clear the extent to which raters consider OSCEs to be fundamentally flawed. Raters believe flaws are present in many aspects of the exam itself, the scenarios, the marking, and of most concern is the perceived negative influences these flaws have on student learning and behaviour. This firm belief has an influence on how raters think while watching and rating candidates during OSCEs.
Participants identify lack of authenticity as a major shortcoming of OSCEs. Raters outlined ways in which OSCEs lack authenticity, including scenarios not being realistic and stations not providing an accurate representation of clinical environments. Simulated patients (SPs) or real patients become overly compliant during examinations as they become accustomed to the technique. SPs responses to students can be variable throughout the day. There is a belief that OSCEs do not succeed in being relatable to real day-to-day work, many participants used the word ‘fake’ during the interviews.
“You cannot simulate for an equivalent type situation in the clinical practice, there is no point in saying that an emergency down in the ED or on the floor or in the operating room is the same as an exam, there is no comparison.” 02010
“they’re so artificial, you know the things you’re asked to do in seven minutes are ludicrous in reality, you would not do that at all and it would be dangerous to do some of the things in seven minutes so you learn how to pass OSCEs as a person sitting them” 01002
As well as the overall set-up lacking authenticity, many raters also cite issues with the ways that OSCEs are marked. Issues mentioned included technology, lengthy checklists, and uncertainty over the constructs being examined. Examiner fatigue and raters examining at stations that they have no experience of, or conversely are overly expert in, were other issues raised. There was a sense that not only the candidates, but also the examiners are going through motions by rote.
“It is a tick-box exercise for the examiner, who isn’t really an examiner, but merely an observer” 02001
Clinicians may feel that students can pass OSCEs without understanding what they are supposed to be doing; students can gain marks very easily on a checklist, even if they perform relatively poorly. OSCEs do not discriminate well between students. Marking systems do not allow proper rewarding of students who demonstrate excellence, and mediocre students are often unfairly rewarded because of the way OSCEs are marked. These issues provoked frustration from examiners. Of major concern is that weak students are able to pass OSCEs by going through steps correctly, even if examiners feel that the student does not understand what they are doing.
“I suppose you could say in theory anyway the OSCE is very clear in what the marks are given for so maybe it’s very fair, but I sometimes think it’s not fair to the outstanding students, and maybe even the weaker students get too much marks for just getting the basics right” 02004
“It was so well rehearsed by the students, there was one student who came in and did it absolutely beautifully, and you could tell that this person just had a connection and understood what was going on and maybe had done an elective where they’d done this with patients a few times and the whole story flowed, whereas everybody else ticked the boxes. And they all did very well, because they ticked the boxes, and they explained the relevant points to the patients, but in one or two it was done brilliantly. But they didn’t excel for that because there was no box that said ‘did this person excel in how they did this?’” 02001
The ease by which poorer students attain marks leads to a discrepancy between marks attained in an OSCE and how the student will later perform in a real life setting. Some reported that there was little correlation with how candidates will actually perform as a doctor. This is a fundamental area in which OSCEs are perceived to lack authenticity.
“This is a person who you would not have confidence in as a doctor. He might know his stuff, he probably knows all his stuff but you would not have confidence as a doctor. But what I find difficult with OSCE is that there usually isn’t a place to record that, I mean if you go through the marking system, he would actually pass” 02004
Student nerves were discussed by many raters. Not only are the stations and marking schemes felt to be problematic, but students are sometimes unable to give an accurate account of their skills due to nerves. Examiners know that stress can impact on student performance, but they don’t know whether the student will behave in a similar way under the stresses of a clinical environment, which were felt to be different to those of an OSCE. This is another source of discrepancy between performance seen in an OSCE and real life behaviour.
“The stresses of an examination are much different and I think a lot of, I think more candidates under-perform on the day of the examination than over-perform on the day. I think that would be my humble opinion about it and especially in the clinical situation, not in a written examination. So for that reason I think that it’s good to give the benefit of the doubt” 02010
Raters discussed how the shortcomings of OSCEs can have unwanted effects on student behaviour. One influence of OSCEs is how some students seem to tailor their learning towards passing the OSCE, rather than how they will need to perform in real life. Students are learning to pass examinations, rather than learning how to practice medicine.
“even today I was teaching the first years and they didn’t want to know about how to put it together, they just wanted to know how the OSCE worked and how they got marks, and it’s very frustrating because were trying to teach them to be doctors, not OSCE-passers” 03003
Students can be seen to put on a performance instead of properly engaging with patients.
“Because this guy is very exam oriented, he’s just ticking all the boxes, he- I won’t say he lacks empathy but in this just video clip, probably he did demonstrate that, he’s not thinking about the patient. He’s thinking about what are the points he needs to cover. He didn’t ask his name. There was no kind of, interaction between the patient and the student, um, he just, he was just getting his findings ready to tell the examiner, so that he can score, he can score marks. He didn’t say bye to the patient.” 03007
OSCEs can have an effect on student behaviour, not just during preparation for the OSCE or during the OSCE itself, but at other times. Raters discussed how students learn how to pass the exam rather than immerse themselves in learning the skills, and that the OSCE format encourages superficial learning. It was felt that students are following lists to tick off marks in their head, rather than approaching tasks in a systematic way. This represents an unforeseen and unwanted consequence of OSCEs. Raters noted that at times students were performing by rote, rather than responding to the issues of the patient in front of them.
“I think that’s a risk of OSCEs actually, I think you can learn how to pass an OSCE as opposed to learning the content and how to actually work in reality” 01002
“He knows what he’s about but I’m concerned that he’s going by a list rather than thinking logically” 03002
Overall the false nature of many aspects of OSCEs, and the effects of this on student learning and behaviour is prominent in rater’s minds while they are rating students. These obstacles lead examiners to search for glimpses of authenticity in an artificial environment.
Theme 2 Looking for glimpses of truth
The firmly held, but previously not fully examined or expressed, belief of raters that most aspects of an OSCE are ‘fake’ leads examiners to seek out glimpses of a student’s true ability. Raters search for authenticity and are on the look-out for students who are just going through the motions by rote. This can lead to examiners rewarding or penalising idiosyncratic elements of a student performance. Some look for evidence of experience via technical skill or familiarity with the clinical environment, others say the rapport a student builds with the patient is something that cannot be easily simulated. An important differentiator for some raters is if students are sticking rigidly to a mental checklist, indicating superficial understanding, or working through a station logically, indicating a deeper understanding. Some raters prioritise how safe a student would be as a newly qualified doctor. These factors can influence how stringent or lenient a rater is in their marking of a candidate.
In the falseness of the exam situation, a marker that a student has spent a lot of time on wards or with patients is sought by some examiners. Showing familiarity with how to approach patients, or evidence of having been present on wards, familiarity with how beds work etc., are noted as authentic signs of a students’ experience.
“It’s all the little things that they do at the beginning that makes you think, ‘oh they know things, they’ve seen this, they’ve done this’. They know what they’re like, they automatically watch rounds, or automatically shake hands with the patient, introduce themselves, they just establish a kind of baseline of things, you know, professional things, that you’d expect people to do and you think, ‘oh this person gets it, off you go’!’02003
Some raters look for indications that a student is well practiced, this often relates to physical examination technique. This can help raters try to differentiate very good students from more average students. Raters value varying aspects of examination technique for example, watching an abdominal examination station, some have very specific comments about how they want students to ballot kidneys, or hand placement during abdominal examination, and others on the degree of exposure of the patient, or other aspects of the exam. It was discussed that students could achieve marks by carrying out the examination correctly but, whether a student is just following a checklist, or is engaged with the process and would actually pick up abnormal findings was important for many raters.
“His percussion is good, which is quite discriminatory. Experienced students, as he clearly is, get a nice confident percussion technique with a nice, clear percussion note” 01001
“You can almost see them feel a liver edge really, and certainly when they’re percussing you should be able to hear it. But you can kind of tell whether they’ve got it or not really. Even if there’s no liver edge to feel really. you can tell that they would do, if it was there. It’s almost like a feeling thing, you kind of- it almost feels like a feeling thing yourself. So it’s not just a flat looking, it’s a 3D assessment somehow, that you’re making” 01003
“It would take a little more time than what he has done here, to, for the information to be transmitted from your fingers to your brain, that there was or wasn’t a thrill or a heave there. In this instance, you may say that that’s just a guy that’s just going through the motions, at this point because if there surely was something there and it’s subtle, it may take a few more seconds …to put your fingers between the ribs, and see which chamber which may be contributing to the, the heave and so forth, you know? Erm, but, he’s done the process” 02001
Some raters use clues which indicate that a student has a lot of experience with patients. Raters look at how students interact with patients, with some prioritising rapport with patients over technical issues, and rewarding this in terms of marking.
“I probably heavily, heavily weight people on how they interact with a patient. The way my mind works is that if they do that really well, I’m less likely to notice small issues with practical issues, whether they’ve missed a question or missed one tiny part of the examination but because overall I’d be thinking that this is actually a good candidate. Because they have actually got that first really critical part, they’ve got a rapport with the patient therefore they’re likely to get a good history where the patients relaxed they’re likely to do a good examination.” 03001
“Language is important isn’t it and people probably don’t realise the messages they’re giving with the language they use. More experienced students who’ve been around patients will use language differently, I think, so it’s subtle but it shows. They probably don’t realise they are doing it cos I suspect, I think the patients are the best feedback actually because they’ll respond differently to the way you speak to them” 01002
Clues as to how the student will perform as a doctor upon qualification are sometimes noted by raters. Raters sometimes benchmark the student against recently qualified doctors and whether a student demonstrates that they would be safe, or unsafe, as a doctor. Markers that a student may be unsafe act like unofficial red flags.
“I suppose in my head I’m thinking, is this is a guy who’s showing me what he can do on a mannequin, or is this a guy who’s showing me what he can do in real life? And to me I was thinking, I would trust this guy to do the first assessment of a surgical patient with an acute abdomen erm, so Yeah that’s, at that level I would have thought it was” 03001
“I think, with any exam, I am fundamentally looking for the person that’s unsafe, and that might be because they’re too confident, they’re not looking after the patient, they’re doing things that make me think, ‘you’ve not been near patients’, or ‘you might make decisions that are over your ability’. These are the things that would really concern me. Obviously if he’d done every single thing wrong, then yes, I would probably say no, he fails, but I think he was a caring person who will learn, and that’s what you kinda want I think that’s what we’re trying to achieve from our, people leaving our school” 03003
Some raters describe relying on their own gestalt during an OSCE, rather than sticking rigidly to a marking system. There is a sense that since the OSCE is so inauthentic, that a rater has to use their own criteria and instincts about student skills. Sometimes judgements are not based on what a rater actually observes, such as this rater making a prediction about a candidate’s likely behaviour were they not under the pressure of time.
“I think my thought process with that student was that he was doing some things effectively but against the rush of time, so that was making him make some errors and if there had been an abnormality, I think he would have gone back to listen or to check” 03003
These assumptions sometimes go beyond the scope of the exam and relate back to how students are taught. Raters sometimes explain a candidate’s shortcomings based on things that they have not seen. These examiners disregard the shortcomings they have identified in candidates because their own judgements hold more weight.
“I don’t know what instructions he’s been given or how he’s been trained (…) I’m sure he would, because he’s very good I’m sure he would normally do the other things” 01001
“He used the prostate cancer as the first thing. I think that wasn’t his fault, that was whoever gave him the lectures on urology and he used benign prostatic hypertrophy and that’s also to blame whoever gave him the lectures on prostatic disease” 02010
They can override parts of the marking system or exam if they feel that a good candidate is being unfairly disadvantaged by the constraints of the system, but may be less likely to do so for a student they do not perceive to be as genuine.
“sometime you see a candidate who is fantastic in terms of their skill, OK, and the second candidate was very good in terms of his skill. But another candidate might be very poor with the patient. Or unpleasant to them, or something like that. And so they get marked down for that but there’d be some degree of compensation. And I’m not quite sure, that’s the bit, I’m not quite sure I’d want them as my doctor. I kinda try and resolve it, how would they be if there were my houseman, would they- that’s the kinda bottom line, am I giving them a score that reflects that? at the bottom line” 01005
Theme 3 Evolution with experience
Raters tend to feel that OSCEs are a performance or a charade, and, in turn, they look for signs of what they consider authentically shows a student’s actual ability. These preexisting thoughts which examiners bring with them into an OSCE are likely to influence their marking. Raters also adapt to the nuances in the different OSCEs they are involved in, and bring this knowledge with them. Some differences became apparent between the experienced and less experienced raters. There is an unspoken fear in less experienced examiners that they may be letting an incompetent student progress. They wish to be seen to be doing everything correctly and stick more rigidly to the instructions. Conversely, there is a sense from more experienced examiners that they are aware of the failings of OSCEs, that they have developed their own idiosyncratic beliefs in what demonstrates authenticity and are comfortable with bending the instructions to some extent. Experienced examiners tend to feel that they are rating only one of a number of stations, added to that is the fact that they know that, even though these are final examinations, that students do not have to achieve perfection, that students will continue to develop and learn as newly qualified doctors.
The flaws of OSCEs may make them poorly discriminating, but some raters find ways to reveal what they believe is a students’ true ability. Some experienced raters develop their own strategies to allow differentiation between students. Less experienced examiners are less likely to deviate from instructions.
“We all probably have our favourite things that we think want to be in there, and weight those more or less than perhaps we should do according to the checklist in front of us” 01003 experienced examiner
“So that’s a good incisive traditional medical student question, ‘what did the doctor give you last?’ Which I would always reward. Mightn’t be on the OSCE sheet” 02002 experienced examiner
The experienced raters, through experience, accept the shortcomings of OSCEs and are more comfortable with their own idiosyncratic ways of looking for authenticity. Inexperienced examiners are still not sure and are torn, and more likely to strictly adhere to the instructions that they have been given.
“this is maybe a more general comment- but when you have a list, as an examiner, with different things to pick off, and then they don’t do certain things, you know, how hard do you then come down on the student, for missing out signs? Or how much do you actually use that more intuitive, global impression that you have cos they maybe missed one or two, even three things? I do know people who, who would say, you know like with the previous candidate, ‘if you don’t ask about suicidal ideation in a depressed patient, then that’s a fail because that’s a question’. I personally would feel that is harsh if they perform well in other respects” 03005 experienced examiner
“When I’m examining I have to be fair, and I will only tick the boxes he has performed. I cannot tick the boxes, assuming that he will be fine” 03007 less experienced
More experienced raters were also more likely to intervene and move a candidate on during an OSCE station if they felt that a candidate was wasting time. Less experienced raters described the frustration of watching students doing things during an OSCE that are not on the checklist and therefore not accruing marks.
“Our OSCE training is that were impartial, but to bring out the best in students, which is actually what we want to do, you’re not impartial. It’s that you may facilitate them to be better, and that might mean that I would have moved that student along, and that’s not impartial. But I know that other examiners may not do that. But I kind of feel that my role is to see how good they are, and particularly in the final year, that I want to see what they’re like because I want to think, are they going to be good doctors?” 03003 experienced examiner
“You spend a disproportionate amount of time looking for them to say these things and I’m always aware in history that they can be doing good quality history taking, getting into more details, but they’ve already got the point. Even though it’s good, and they’re delving into ones you’ve mentioned, key word, or that’s the key question, you have your point” 02006 less experienced
Examiners are aware that there are differences between how they and other examiners might rate candidates. Co-examining with first-time examiners was described as challenging as new examiners have a tendency to question every small detail of a candidate performance. Examiners were aware that their own methods of rating OSCE candidates had evolved over time.
‘it was a nightmare working with him all day because every minute he was stopping me or one of the other examiners to ask a query, what should I do, should I fail them on this, or that or whatever, and you know it was a function of just, he wanted to be seen to do the everything right, he had probably this fear that the was going to release this unqualified person onto the community. So all reasonable, and he was doing all the right things, but same time, if hes after a few years of doing it, he’ll have a much more balances view of the whole thing’ 02005
‘the first year or two I would have been very very conscious of everything, all the boxes ticked, and being mechanical in how you did it. But now, I have a more overall view of, an even, I suppose, sit back and have a sense of the overall standard as the day goes by as well’ 02008
Experienced examiners tend to feel that they are rating only one of a number of stations, added to that is the fact that they know that, even though these are final examinations, that students do not have to achieve perfection, that students will continue to develop and learn as newly qualified doctors.
‘I expect them to learn more as an FY1 and FY2, that they’ve got the tools of the trade now but they actually fine tune them with real patients because not having responsibility and having volunteer patients here, there is an element of artificiality which limits their learning of nuances. They’ve got the tools, basic tools, and they can refine them later on’03002