The results are presented as follows while the featured studies are first given a general overview and descriptive-narrative summary (Supplementary 3). Thereafter, six research questions were answered.
Descriptive summary of the included studies
This review includes 49 studies in total. There were published research between 2000 and 2023. The majority of studies (26, 53%) were carried out in the USA. Canada with (7, 14/3%), Australia (5, 10/2%), England (4, 8/2%), Netherlands and Denmark each (2, 4/1%), Finland are Scotland and Switzerland each (1, 2%) published articles were in other levels.
The highest percentage of studies were focused on surgical specialists (20.4%, 10%), anesthesiology groups including anesthetists and nurse anesthetists (24.6%, 12.0%), and emergency groups (24.6%, 12.24%). The target group for these studies was specialists in various medical fields (22, 9.44%), followed by nurses (9, 18.4%).
The type of simulation methods used for (SBAs) in (HPE)
Simulation methods used in SBAs for health professions education include Standardized/patient simulators (20, 21, 22), anatomical models, advanced patient simulators (APS) (23), Structured Clinical Skills Examination (OSCE) (21), high fidelity manikins and synthetic models (21, 24), virtual clinical stations and virtual structured clinical examination (21), cadavers (25), animal tissues and models (24, 26), simulated environment (i.e., theater) (27), computer and electronic simulations including NeuroTouch simulator (28), bronchoscopy simulator (29), colonoscopy simulator (30), transesophageal echocardiography simulator (31), Cisro endoscopy simulator (32), virtual reality (VR) simulators like Mimic TM Flex VR (26, 29) and EyeSi,(33), and augmented reality simulator like CHARM (34).
Target groups examined in SBAs of HPE
The study of SBAs in health professions education includes a range of target groups such as specialists (29, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45), pre-hospital care providers (PHPs) (46), nurses (20, 47, 48, 49, 50, 51, 52), emergency medicine personnel (21, 53), respiratory therapists (54), medical emergency technicians and paramedics (54), pharmacy personnel and pharmacists (27, 55), intensive care providers,(56) family physicians (39, 43), and midwives (57).
Applications of SBA methods in HPE
In the extracted studies, the final applications of (SBA) methods are mentioned, including the following cases:
- Assessment of competence, knowledge, performance, and performance sequence (49, 56, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72),
- Clinical judgment and diagnostic power (56, 60, 62, 65),
- Critical thinking (60),
- Technique of obtaining case history (73),
- Patient counseling and guidance abilities (73),
- Simple and complex clinical reasoning and decision-making skills (73),
- Communication and remote patient management skills (in simulated patient method) (65, 74),
- Technical and non-technical skills (42, 45, 56, 60, 61, 63, 65, 69, 72, 75, 76, 77, 78, 79, 80, 81),
- Deep reflective and contemplative thinking skills (62),
- Management of uncommon findings (62),
- Examination skills for sensitive organs (such as chest and pelvis) (62, 82),
- Teamwork and interaction skills (62, 65, 72, 83, 84),
- Leadership skills in emergencies (62),
- Risk management and preparedness in disasters and crises (69),
- Management of common and rare cases (85).
- Evaluations of both formative and summative aspects that have a significant impact on the fate of individuals, such as the issuance of board licenses and certificates, or the renewal of board licenses (21, 37, 39, 43, 86, 87, 88)
- Assessment of different levels of Miller's pyramid (knows, knows how, shows, does) (89)
- Evaluation of curriculum components that cannot be assessed using traditional methods (37)
Challenges of SBAs methods in HPE
In the extracted studies, the following challenges of using SBA methods have been mentioned:
-
Lack of realism in current simulators, such as not simulating mental and psychological states (62),
-
Difficulty in generalizing performance from simulators to real-world environments (41),
-
Limited assessment of a small sample of skills with simulators and the lack of a comprehensive indicator of overall physician performance (41),
-
Individual performance assessment instead of team-based assessment (41),
-
Failure to evaluate factors such as environmental impact on performance (41),
-
Difficulty in defining precise performance (41),
-
Challenges in defining and measuring non-technical behaviors (e.g., timely arrival to the workplace, individual behavior in a group, listening to recommendations, empathy towards patients, professional and courteous behavior) (41),
-
Discussion of the comfort level of test-takers during work in familiar real-world environments versus unfamiliar simulations (41),
-
Difficulty in defining competencies and capabilities (63),
-
Challenges in evaluating competencies through simulation due to the lack of stimulation of real-world performance conditions (65),
-
Discussion on setting standards for assessment (65),
-
Discussion on appropriate sampling (65),
-
Discussion on behavior change in simulated patients (65),
-
Challenges in creating scenarios and synthesizing cases (65),
-
Difficulty in simulating the human body (65),
-
Difficulty in reducing performance and outcome measures (65),
-
The presence or absence of a culture and policy of simulation use (69),
-
The complexity of healthcare (69),
-
Existence of multiple stakeholders (69),
-
Variability of specialized care performances (56),
-
Difficulty in the development and design of simulations (56),
-
Difficulty in creating valid assessment tools (56),
-
Designing appropriate scenarios (83),
-
Lack of studies on suitability for high-stakes assessments (83),
-
Religious and/or ethical challenges (e.g., in animal models) (82),
-
Discussion on appropriate grading methods (85),
-
Presence of high error sources in (SBAs), including assessors (85),
-
Development of suitable programs (90),
-
Discussion on engaging participant involvement (90),
-
Discussion on appropriate checklist design (91),
-
Limitations in submitted cases for assessing each physician in SP (91),
-
Lack of standard scenarios (92).
Advantages of SBAs methods in HPE
The final advantages of SBA methods include high sensitivity, High specificity (20), High validity (23, 25, 30, 35, 36, 38, 48, 53, 93), Satisfaction of learners and test takers (27, 28, 34, 54), similarity to the real environment (23, 26, 43), reduction of patient harm (26, 29, 36, 37, 39, 87, 94, 95), reduction of learner and test taker anxiety, (21, 24, 26, 47, 86) complementarity of learning and testing (21, 24), correlation between simulator and in real performance environment (33, 45), provision of feedback (26, 37, 87, 95), objectivity of assessment (24, 96), usefulness emergency conditions (24), case diversity (87), replicability of simulation, reduction of disparities in assessment (95), ethical superiority (39, 95), increased interest in simulation (39) and structured assessment high reliability (89).
Disadvantages of SBAs methods in HPE
The use of SBA methods has been criticized due to high cost, (26, 57, 86, 89, 94) anxiety among users (21), lack of realism (94), limited validity (89), and disposal concerns (26). Some methods also evaluate only simple skills rather than advanced ones and may suffer from sampling errors.
Necessary conditions for using SBAs in HPE
Test objectives
It must be clearly defined and examined by counterparts, as stated by Decker et al. (23) and Glavin et al. (23, 37) Evidence can be used to identify objectives, competencies, and skills, according to Glavin et al. (23, 37) Assessment objectives should be explicit, which has been emphasized by Rizzolo and colleagues, Boulet and colleagues (52, 94), and Ennan et al. (57) It is important to precisely determine the assessment objective and match the test design to the level of learners and staff, as highlighted by Kardong-Edgren and colleagues (51, 57).
Performance standards in assessment
Individuals such as Macario and his colleagues emphasize the need for specified performance standards and consensus on assessment checklists (41). Identification of clinical activities that can be tested in simulation is also important (48). According to major researchers such as Crosby and his colleagues, precise determination of assessment criteria and use of guidelines for developing assessment tools and criteria is necessary (36). Grand and his colleagues suggest that performance assessment items should be specific and accurate, with a defined minimum passing score (21).
Assessment of examinees and standardized patients
Decker et al. and other researchers emphasize the importance of training both examinees and standardized patients. Prior practice in a simulated testing environment can lead to increased comfort and decreased anxiety for examinees (23, 45). Assessors and facilitators must also be trained according to Rizzolo et al., Lien et al., and Boulet et al. (80, 88) Dunkley et al. notes that if simulation is used through a standardized patient method, the SP must be properly trained (24). Ennen et al. suggest that trained personnel and individuals should work as a team for SBA and education (57). Strinivasan et al. states that if a standardized patient is utilized, the individual must receive proper training (21).
Criterion fidelity (real-world similarity) testing
Fidelity is a complex phenomenon that includes physical and psychological aspects (56). Reliable equipment enhances fidelity testing (23). High fidelity is a key feature of assessment methods (54). Simulations with high fidelity are more effective for education and evaluation. High-fidelity simulation is more important for competency assessment than performance assessment (24). Scenario fidelity is a central and fundamental principle for assessment validity. Technology can be used to increase fidelity (56, 86).
Video recording of examinations
Proposed by Strinivasan et al., the use of high-powered fiber-optic cameras and microphones for video recording is essential for performance assessment (21). Podolsky et al. also recommend video recording for evaluating examinees (93). Video recording allows for repeated viewing of performance according to Rizzolo et al. (52) Shah et al. suggest video recording for assessment and re-evaluation (26).
Expert opinions on assessment
Various studies suggest the use of SBA and involving experts and knowledgeable sources in administering assessments (41). Physicians and faculty members are recommended for validating assessment tools and identifying test cases. The content of the assessment tool should be clear (37, 48). The designed tools should undergo consultation and expert opinions and accuracy should be determined precisely (30). Skilled individuals should be used in the assessment team (87). Obtaining and implementing expert opinions on scenarios is important. Simulation training should be observed before evaluation. (57).
The Application of SBAs Technique: Dunkley et al. suggest that simulations used for competence assessment should recreate all aspects of everyday reality and results should be interpreted carefully. Performance criteria can be identified using evidence (24). Rizzolo et al. suggest that behaviors indicative of competence and ability should be clearly defined when using simulation for competence measurement (52). Hall et al. believe that simulations should not be used alone for evaluating knowledge (87). Technical and non-technical skills should be defined accordingly (24). Nabzdyk et al. suggest checklists for assessing technical skills and overall scoring scales for non-technical skills (56). Ennan et al. recommend valid and standardized scenarios and assessment methods for high-stakes assessments (57). Further studies are needed to establish the appropriateness of simulation for high-stakes assessment according to Kardong-Edgren et al. (51).
Different simulation methods and techniques
Different simulation methods have specific conditions. Dunkley et al. suggest using artificial models and animal tissues for invasive procedures, while electronic simulation can evaluate knowledge and competency (24). Simulated environments can assess teamwork, communication, and crisis management skills (22). Nabzdyk et al. recommend modeling simulation scenario techniques based on real events and choosing criteria and tools based on the skill or task being assessed (56). Schuwirth et al. advise against using the PMP technique for fate-determining examinations, instead preferring open-ended questions and limiting the number of questions (89). Checklists should be designed to measure the consequences of problem-solving correctly. Strinvasan et al. suggests creating an appropriate clinical environment for comprehensive mannequin simulation (21). Shah et al. recommend an SBA simulation tool with accurate anatomy, interactivity, physical, visual, and physiological reality, and haptic reality (26).
Validity and reliability of assessment tools
Studies have discussed the validity and reliability of assessment tools. The validity of assessment tools should be measured using frameworks such as Messick's framework of validity theory (97). Enhancing the reliability of assessment can be achieved through increasing the number of cases, using blind assessors, and standardizing and training assessors. Consultation with experts and piloting simulations is important for creating valid and reliable scenarios (52). Scenarios need to be valid for content validity (23). Providing sufficient time for assessment increases its reliability. Increasing the number of scenarios improves the validity and reliability of the assessment and using expert consensus can enhance validity (94). A large number of cases and a significant amount of time are required to achieve reliability in the progress test (21). The reliability of the Angoff method increases with an increase in the number of stations, appropriate time duration for each station, and having a reliable and accessible checklist. The reliability of the Angoff method depends on the number of observations if the assessment goal is at the "does" level in Miller's pyramid. Well-trained standardized patients must be selected to increase the validity of the assessment using the standardized patient method (89).
Feedback
Feedback is necessary for evaluating the assessment method. Simulations that offer feedback are recommended by Hall et al (87) and Roussin et al. for formative assessment (88).
Curriculum change when using simulation-based assessment
Ennen et al. suggest designing an appropriate curriculum for simulation-based assessment, considering the field and level of test takers (57). They advise that teamwork scenarios should be more complex than those assessing individual performance. Rizzolo et al. recommend modifying and reviewing the curriculum for simulation use, ensuring contextual and conceptual alignment with teaching. They also suggest evaluating individuals under clinical conditions with the use of SBA and other assessment methods (52).
Timing of the assessment
Rizzolo et al. stress the importance of allowing ample time for test takers to complete assessments (52) Boulet et al. also suggest that appropriate time allocation can enhance the reliability of assessments (94).
Cost of assessment
Boulet et al. suggest that using the SBA assessment method is expensive and requires investor attention (94) Rizzolo et al. also emphasize considering the cost-effectiveness of this method (52).