The Utility of Work-Based assessments in higher general surgical training – a systematic review

Background In the UK, work-based assessments (WBAs) including procedure based assessments (PBAs), case based discussion (CBDs), Clinical evaluation exercise (CEXes) and direct observation of procedural skills (DOPS) are used in Higher General Surgical Training Programme (HGSTP). This review aims to investigate trainer and trainee’s perception of the usefulness of WBAs based on the published literature in HGSTP. Methods Using mesh headings WBA or PBA or DOPS or Mini CEX with their full forms literature search was carried out in December 2018. Seventeen surgical studies were retrieved describing their usefulness. The usefulness was analysed according to van der Vluten’s utility formula, the product of educational impact, validity, reliability, acceptability, cost-effectiveness and feasibility. Results Among 6 studies on PBA, the validity, reliability, acceptability appeared good. The educational impact was positive to the Kirkpatrick level 1 and 2. One study on Mini CEX showed Kirkpatrick level 1 positive satisfaction in trainees and trainers. CBD had positive Kirkpatrick levels 1 and 2 impact and was valid and reliable by trainees in 2 studies. Two studies on DOPS showed good construct validity with positive Kirkpatrick level 1 impact. Based on 6 studies, the use of multiple methods as used in the intercollegiate surgical curriculum project (ISCP) portfolio was more negative in Kirkpatrick level 1 and 2. The recognised problems are lack of time, lack of faculty development and concerns about their validity and reliability. Conclusion the individual WBAs appeared useful in study settings, their perceived usefulness declined when used as multiple methods in practice.


Background
In line with changes in the world, the post-graduate training programme in the UK has undergone several changes-some of these include European Working Directive (EWTD),

Modernising Medical Careers (MMC) and Postgraduate Medical Education and Training
Board (PMETB). 1 The traditional apprentice -based training model and assessment during post-graduate training laid a strong emphasis on the knowledge component of the syllabus using multiple-choice questions (MCQs), clinical examinations and vivas. 2 There is increasing public demand and scrutiny on the quality of care provided by doctors and their training received. 3 There was a need for the assessment methods to test them in action (at the workplace) as the MMC was born in 2007. 4,5 Workplace-based assessment (WBA) is one of these systems and refers to "the assessment of day-to-day practices undertaken in the working environment"-or, more simply, workplace-based assessment is an "assessment of what doctors do in practice." 6 In the UK, after completing medical school the new doctor enters a long training pathway for several years before becoming a consultant. For surgical specialties, after completion of a 2-year foundation year training, they enter a 2-year generic core surgical training programme after which through a competitive national selection process, they enter into Higher Surgical Training Program (HSTP). 7 The UK HSTP has 10 surgical specialties namely -Cardiothoracic Surgery, General Surgery, Neurosurgery, Oral and Maxillofacial Surgery (OMFS), Otolaryngology (ENT), Paediatric Surgery, Plastic Surgery, Trauma, and Orthopaedic Surgery (T&O), Urology and Vascular Surgery. General Surgery is one of the largest specialties with an intake of approximately 100 trainees each year through the National Selection process. 8 Several institutions are responsible for training in Surgery.
The Royal Colleges of Surgeons through the Joint Committee on Surgical Training (JCST) and its ten Specialty Advisory Committees (SACs) e.g. General Surgery SAC for general surgery set up the curriculum standards for General Surgery. Schools of Surgery at Deanery level and Hospital Trusts at the local level run General Medical Council approved training programmes. The curriculum, delivered through Intercollegiate Surgical Curriculum Project (ISCP), lays strong emphasis on specialty knowledge, clinical exposure, technical and operative skills and professional skills and behaviour.
The ISCP has laid a strong emphasis on WBAs. These include Procedure Based Assessments (PBA), Clinical Evaluation Exercise (CEX), Case-Based Discussion (CBD), Direct Observation of Procedural Skills (DOPS) and multi-source feedback (MSF) to achieve skills of 'performance' at the workplace.The main purpose of the WBAs is to help the trainee to learn and develop and provide evidence of progression in attaining clinical competency. 9,10 They are supposed to be used as a source of formative assessment and provide feedback helping trainee's learning and development. The MSF has been used for many years in many specialties and has been shown to help trainees to develop. 11 The areas assessed by MSF are doctor's behaviour and are carried out by all members of the team where the doctor works. Other WBAs need to be carried out by those who are consultants or trainee's supervisors and assess knowledge, skills and attitudes at specific tasks at a professional level. The evidence available supporting the use of PBA, CEX, CBD, DOPS in higher general surgical training programme (HGSTP) particularly at the time of their introduction was lacking. 12 The workplace for HGST may be different and unique to many other programmes. The HGST sees and treats patients in the ward as inpatients, sees patients in the outpatient clinics, performs operations to treat several conditions and looks after them pre and postoperatively. The skills required to develop for completion of training are different from other specialties particularly operative skills. This is also a transition between core surgery training (basic surgery training) and becoming an independent practitioner as a consultant. This stage of HGSTP has some independence in many areas where the trainee is already competent to perform tasks at workplace while many actions need to be performed under the direct supervision of the consultant particularly surgical procedures.
The WBAs used by this grade need to take into consideration to include all types of work they do in their place of work and need to be useful. This review was undertaken to assess evidence relating to trainee and trainer experiences and perception on the usefulness of WBAs particularly PBA, CBD, DOPS and mini CEX in HGSTP since their introduction.

Search strategy
Using mesh headings work-based assessment or workplace-based assessment or procedure based assessments or PBA or direct observation of procedural skills or DOPS or clinical evaluation exercise or CEX or Mini CEX or Case based discussion or CBD, the literature search was carried out in December 2018 using Medline database. Altogether 3440 articles were obtained as titles dating back to 1997. All these titles in Pubmed were screened by KA and only those articles describing different types of WBAs in all specialties were selected which obtained 158 articles. The abstracts of these articles were screened and only those which were in surgical specialties were selected. This narrowed down the articles to only 30. All of these articles were further studied and only those studies with participants describing usefulness were included which yielded 17 studies. All these 17 studies have been included in following tables. Inclusion criteria included any study either quantitative or qualitative describing WBAs in surgical specialties. Endoscopic procedures, gynaecological procedures, anaesthesia, renal, paediatrics, histopathology nursing, medical students, medical specialties have all been excluded from analysis.
Emphasis has been laid in general surgery training literature though WBAs used in other surgical disciplines have been included. There is some mention of WBAs from other specialties to describe history and background.

History of WBAs in UK
A study of WBAs, carried out in the UK in 2003 and 2004, to evaluate their use in medical specialties showed them to be feasible and made a reliable distinction between doctor's performances. 13 With the introduction of MMC in 2007, they were then introduced in all specialties including surgery and for all grades in training. Since their introduction, they have been refined, developed and modified. For example, WBAs have been renamed to supervised learning events (SLE) and modified to emphasise feedback for foundation trainees. 14 There are some specialty-specific WBAs, for example, PBAs are used in surgical specialties to include performance in actual operative procedure in the operating theatres while others including DOPS, CBD, mini-CEX are common for all specialties.

Types of WBAs in HGST and definitions (PBA, Mini CEX, CBD, DOPS)
As discussed above MSF has is a very important and useful marker of doctors' behaviour in all specialties including HGSTPand not a focus of this review. We will explore instead other WBAs whose evidence base is not as strong in HGSTP. Table 1 shows the definition of 4 types of WBAs used in HGSTP.  Figure 1). 22 In this pyramid from the lowest level to highest in the sequence is knowledge (knows), followed by competence (knows how), performance (shows how), and action (does). Work-based assessments represent methods of assessment at the highest level of the pyramid. With the help of an assessment in 'action', we can know what happens in the workplace rather than artificial testing conditions at lower levels. The trainee's 'action' collected in this way gives us information about trainee's performance in their day to day practice. Other methods of assessment such as MCQs, simulation tests and OSCEs represent the lower levels of the pyramid. These assessments targeting the highest level in the pyramid can be carried out by the assessor observing trainees in the workplace. 22

Utility formula
The usefulness or utility of an assessment has been defined as the product of educational impact, validity, reliability, cost-effectiveness and acceptability. 2 Subsequently, practicality (feasibility) has been added to one of the components of the above equation. 23 Impact of WBAs (Kirkpatrick) To evaluate the impact of teaching and learning programmes, Kirkpatrick (1967) suggested a hierarchical model of evaluation pyramid from lower-level to higher-level from 'level 1-'satisfaction', to level 2 -'learning', to level 3 -'behaviour' to level 4 -'results' (Figure 2). 24 Educational impact can be assessed by Barr -'benefits to patients/clients'. 25 Two systematic reviews so far have looked at WBAs and found level 1 or level 2 impact similar to participant "satisfaction" and "learning" in the original Kirkpatrick level. 12  and 2). They thought focusing on quality rather than the number of assessment events may improve the educational benefit gained by trainees. This study was a largescale national study but included only the trainees' views without any view from trainers in the orthopaedics training scheme.
Another study in 60 orthopaedic trainees and trainers using survey questions showed there was confusion on the purpose and method of using them. The comments made about them were mainly negative in Kirkpatrick level 1. 35 Shalhoub, Marshall and Ippolito (2017) 36 Table 2 In summary, from these 6 publications to date on PBA (5 quantitative, 1 qualitative); the validity, reliability, acceptability of PBA appears good. Kirkpatrick's level for educational impact is positive to the level of 1 and 2, particularly when used correctly with adequate planning as formative and feedback tools. However, there is some negative Kirkpatrick feeling due to confusion on methods of using them. 32-37

Mini CEX
Only one study could be found at surgical specialty in postgraduate year 2 trainees in general surgery. There was Kirkpatrick level 1 positive satisfaction in trainees and trainers. 38 Several studies evaluating the mini CEX have been performed including one metanalysis but all these are in non-surgical specialities. They have shown a medium combined effect on construct validity 39 and level 1 to 2 positive effect on Kirkpatrick model. Some negative points have been; they can be anxiety-provoking and trainees use lenient assessors to get better scores. [40][41][42][43] Table 3 Mini CEX studies CBD Only 2 studies have been published describing CBDs in surgical specialities one in basic and higher surgical training context and another one in otolaryngology context. From these studies, it can be seen that CBD has been taken generally positively at Kirkpatrick levels 1 and 2 and is valid and reliable. Neither of these included trainers. 44,45 Table 4 Looking outside surgery, one from medicine 46   there was inverse relation with the complexity of the case and their use meaning they are used more often by core trainees rather than by HGST. 21 Another study in Opththalmology showed an increase in satisfaction (Kirkpatrick level 1) from year 1 to year 3. 48 Table 5 Outside surgery, a survey returned by 25 of the 27 pre-registration house officers completing the assessments was positive with most (70%) feeling that direct observation helped to improve clinical skills. 50

Multiple methods (WBAs)
Rather than looking at individual WBAs in isolation, we should look at multiple WBAs covering all types of work undertaken by the trainee at the workplace. 51 In the surgical context, 6 studies looked at the impact of multiple assessment methods on education and training. As shown in table 6 several problems have been identified including negative Kirkpatrick level 1 on educational impact. 51-56 Table 6 Annual Review of competence progression (ARCP) process and number of WBAs All HGSTs undergo an ARCP each year. There is a need to have a minimum number of WBAs for each trainee to achieve satisfactory progress. While the London deanery has set this number to 80, for the rest of the deaneries minimum number of required WBAs is 40 at present. One study found the most favoured number was 18. The quality rather than number should be the priority when checked at ARCP.
Purpose of WBAs-formative or summative ISCP (2013) 16 states, "WBAs are meant to provide constructive feedback and support learning in the first instance. WBA validation should be carried out immediately afterward following a clinical encounter. There should not be pass or fail but part of learning and development. Those carrying these out should have relevant qualifications, experience and appropriate training including constructive feedback". Then ISCP also states "they can be used as a summative assessment at the end of the training placement by the ES and counts towards the end of placement ARCP". 16 These statements cause some confusion about their use as formative or summative. The WBAs themselves are not supposed to be treated as a summative test, although WBAs can form part of the portfolio of evidence submitted at the annual review. 57 One of the contributing factors for lack of engagement by trainees in the use of WBAs has been because these are used as a summative tool. 58

Discussion
The WBAs have been in use in HGST for more than 10 years. There are not many studies in Ten of these studies included here though not from purely general surgery-many points are similar and apply to higher general surgery also. The most useful WBA from these papers appears PBA then CBD. The DOPS and mini CEX do not appear that useful in HGST. 54 As is seen DOPS popularity increased in basic surgical trainees after they were modified. Also, there was an inverse relationship with the seniority of the training and their use. 21 In HGST mini CEX where a trainee is observed clinical encounters such as history taking, examination, breaking bad news etc may not be important to be observed due to their independence in carrying out these tasks in day to day practice. This may have reduced their usefulness. Taking consent may be useful though finding time to be observed by a trainer is still a problem.
The number of WBAs is mostly set at 40 at present in most deaneries. One study in orthopaedics suggested 18 was mostly favoured number which is also the number used in the foundation training programme. 34 The number should be that they should be enough numbers to maintain reliability and at the same time making sure they maintain quality.
Having 80 in some deanery may decrease their quality being used as tick box exercise.
The quality of WBAs which can be checked at ARCP or by an educational supervisor may help trainees' learning. Their use as a formative method rather than a summative method would increase the value as shown in these studies.
Generally speaking, the perception of WBAs is becoming more positive with time both in trainees and trainers and most believe that these tools if used correctly and formatively are useful. They help to develop specific skills. It was interesting that though WBAs, when considered alone had good utility based on van der Vuten'd utility formula evidenced by good validity, reliability, acceptability and positive Kirckpatrik level 1 or 2. The same could not be seen in real scenarios when they were used together in the ISCP portfolio.
This may be partly because when they are studied under study conditions may be different from how they are used in real practice.
With ongoing change and desire to improve, there is a need for global professional assessment of the trainee and such an assessment tool called Entrustable Professional Activities (EPA) assessed by multi consultant review (MCR) using capabilities in practice (CIPs) have been suggested for the future 59,60 . These are intended to be introduced in the HGSTP next year. While these new ways of assessments including EPAs and GPCS assessed by CIPS and MCRs should be incorporated for professional and global skills; WBAs should be retained for the development of specific skills. It is important that they are used properly so that trainees get the maximum benefit from them. There is a need to revisit and conduct large scale national and international studies to assess the impact of these tools and improve them helping in the development of general surgical trainees.  15 and subsequently refined by the SAC for surgery for use in all the surgical specialties. Th form has 3 principal components (ISCP) -feedback trainer and trainee, a series of competenc within 5 domains (preop, consent, exposure/closure, intraoperative, post-op) and finally a glo assessment that is divided into 8 levels of competence. The highest competence rating mean the trainee is able to perform the procedure without supervision and deal with complications standard expected of day 1 consultant in the National Health Service (NHS) following CCT.

Abbreviations
Mini CEX CEX was designed originally by the American Board of Internal Medicine and this took 2 hours complete. 17 The mini CEX a shortened version was then developed which has been modified in the surgical clinical encounters. 18 These include observing the trainee's interaction with patient in events such as history taking, physical examination, consent (CEX-C), professionali and communication skills in breaking bad news.

CBD
The CBD was originally called chart-stimulated (CSR) by American Board of Emergency Medicine. 19 A modified version called case-based oral assessment was used by GMC to asses failing doctors using 2 assessors who assessed a doctor including a portfolio of cases manage the doctor. There was a good representation of daily activity. Finally, it was developed into CB and is used across all specialties. Currently, this is supposed to be conducted as a structured depth discussion of management of patient by the trainee and justifications for their actions. explores knowledge, judgement and clinical reasoning of that trainee testing a higher level of thinking and promoting deep learning in Bloom's Taxonomy. 20 DOPS This was originally developed and evaluated by Royal Colleges of Physicians in a range of bas diagnostic and interventional procedures assessing trainee's technical, operative and profess skills. It has been modified to include range of procedures in the surgical context including operating theatres. 21