Evidence provided by educational papers and validity definitions suggests that ST for LTS development should be validated (15, 18, 35). Our Systematic Review highlights evidence of feasibility, utility, and importance of multiple validation strategies in development and presentation of laparoscopic simulation models. Heterogeneity between papers is rampart, and widely explained due to unavailability of standardized validation strategies for simulation models in surgical education. One of the most vital difficulties lays in the fact that terms utilized across studies are not standardized, such as conflicting and incomplete information on participant experience levels. Terms such as “novice” are not punctually defined in literature; authors take novices as medical students, first year residents or any year residents, even though these represent extremely different populations with high variability in informal training and skill acquisition. Furthermore, in some cases they provide no data regarding previous training or laparoscopic experience when differentiating experience levels, thus increasing the difficulty in cross study comparisons of ST.
Reviewed literature showed that most validated simulations are carried out on VR, using both visual and haptic equipment. Other ST currently in use include synthetic models, live or cadaveric animal models, and human cadaveric specimens preserved by freezing (2, 4–7), all these simulation tools are used to train skills through standardized exercises and repetition (7), offering the opportunity to objectively measure the skills and evaluate progress (10), and allowing the training of resistance and strength sensation required for laparoscopic surgery (11).
While previous studies suggest residents prefer training on frozen human or animal cadaveric models (12), no validation of these ST was found in the reviewed literature, even when it was the preferred model between students, high costs and difficulties in preservation of cadaveric specimens are strong drawbacks that may contribute to scarce literature objectively evaluating their usefulness. Multiple systematic reviews focus on the effectiveness of specific ST's such as virtual or cadaveric simulation (13, 15), but don’t compare results across different ST’s.
Most authors attempted to assess face validity; however, no standardized validated score exists for this purpose, questionaries and scores tend to vary, even questions on surveys are not comparable and thus represent a source of heterogeneity. Due to the high variability of questionnaires used, and in the absence of a standardized strategy to validate ST, comparison between papers is not objectively possible at this time.
All studies evidenced some measure of improvement in LTS when using simulation, which is supported by current literature and suggests simulation training in laparoscopy is useful to develop LTS in residents (1–3). Simulation exercises allow students to practice in a controlled environment, giving a safe space to make mistakes while receiving feedback from teachers and experienced clinicians, thus reducing risk of errors in future procedures (36). As a requirement of general surgery residency programs, laparoscopic simulation of any kind should certainly be included (1), ideally before residents perform any surgery in patients (4, 16).
As stated previously, it is not possible to determine which ST has a better performance for general surgery residents due to the heterogeneity of models and lack of standardized definitions, measuring tools and validation strategies, which limit comparability between them (4, 35). Even when the quality of individual studies was moderate, as evidenced by the MERSQI tool, these differences in methodological procedure prohibit adequate comparison between the different ST's. Despite this limitation, improvement has been described in LTS acquisition after simulation, regardless of the model, methods for validation and validation type which reinforces the need for simulation during training in surgery (1, 4, 5, 6, 9, 21, 27–32, 34–36).
Face and content validity
Both studies evaluated BS models, Bahsoun et al. (25) and Achurra et al. (26). Although they had a similar ST, there were substantial differences between them, particularly in the subjects included for each study. Due to a lack of clarification on expertise levels, study populations were not comparable. While Bahsoun et al. described the study population as “residents and experts”, Achurra et al. defined them as “novices and experts”, furthermore each author used different selection criteria to classify the participants expertise. Face validation for these ST was performed in both studies with non-standardized questionaries developed for the singularities of each BS model, which in accordance with literature, a lack of standardized and validated questionaries can be a confusion factor in multiple scenarios (36).
Face and construct validity
Four studies assessed both Face and construct validity. In this sub-group most ST were VR, consistent with current major prevalence of this type of simulations for laparoscopy training (2, 4–7). De Moura Júnior et al.(27) presented a BS model, face validity was assessed by means of a self-administered survey which tested both satisfaction with the model as well as perception on performance. Sánchez-Peralta et al.(28) and Sankaranarayanan et al. (30) presented a different VR model. The main distinction between those two models, was that Sankaranarayanan et al. included assessment of environmental factors that may interfere with surgery outcomes such as operating room noise and interruptions which can be included in the immersive VR environment and should be considered while training is performed, giving this model an interesting additional perspective.
Lahanas et al.(29) proposed an augmented reality simulator, in which virtual objects were superimposed on reality. Participants were described as either novices with no previous laparoscopic experience, or experts but no further information was provided. It is the only study that proposed an augmented reality simulator (29).
Face, content, and construct validity
Only four of included studies assessed all three: face, content, and construct validity. Two assessed VR ST´s and two BS´s. Van Empel et al.(21), Pérez Escamirosa et al. (32), Kawaguchi et al.(33) and Arts et al. (34). Van Empel et al.(21) assessed face and content validation by means of a questionary while construct validity was appraised by a series of tasks. Perez Escamirosa et al.(32) presented a ST and assessed face and content validity by means of a questionnaire comprised of 13 statements answered with a five-point Likert-scale, construct validity was measured by skill performance in four tasks.
Kawaguchi et al. (33) assessed face and content validities with a questionnaire designed by authors, questions were answered by selecting a mark on an ordinal scale ranging from 1 to 5 and content validity was evaluated by experts only. Construct validity was tested by parameters of a task completed using the VR simulator. Arts et al. (34) described an augmented reality VR model and assessed face and content validity through a questionnaire comprised of a five-point Likert-scale, while construct validity comprised the outcome parameters generated by a tracking software.
As already stated, an important variation on Likert-scales and questionnaire design is apparent in the included studies, which hinders data analysis and comparability of results (36).
COVID-19 and Simulation Models:
Development of novel learning modalities such as synthetic models and VR, have been accelerated during the COVID-19 pandemic (37). However, many simulation models are nascent in development and represent high costs for their application (16). Currently BS and VR models are the most prevalent simulation tools available, BS models are favoured due to their visual and haptic realism while maintaining relatively low costs. Rapid advent of technology may expand on the possibility of VR models using haptic feedback while lowering development and usage costs.
Strengths and Limitations
As already stated in the literature, limitations in this type of reviews are related to restrictions in medical education research, ethical barriers and the fact that no gold standard has been described to assess surgical training outcomes (16, 38).
A major limitation of this review is the lack of inclusion of predictive validity, considered an “ultimate” assessment to establish validity of an educational tool (16, 36). Nevertheless, it remains difficult to implement in practical terms. Also, the inability to compare data acquired by diverse researchers due to heterogeneity of gathering tools or variations of Likert-scales hinders the possibility to acquire strong evidence about the use of simulation in different contexts.
The fact that only manuscripts which considered two types of validation with at least one being content, or context excluded many partially validated ST models which are being used worldwide. And the specific population being general surgery residents also diminished multiple simulation scenarios that could be of importance.
Furthermore, research participants definition and selection for our review limited studies considering exclusively medical students, which were excluded, nevertheless several manuscripts validated content by assessing improvement in performance in laparoscopically “naive” subjects, defined as undergraduate students (39).
However, according to our knowledge, this review is the first to address face, content and construct validity for laparoscopic surgery training ST validation. A recent SR published by Shah et al. (40) aimed to describe the status of simulation-based training tools in general surgery in the current literature, assess their validity and determine their effectiveness (40); that systematic review may be complementary to ours. While Shah et al. included all published simulation models and described a variety of validations proposed by authors, no distinction was made to highlight and review models with at least two kinds of validation, one of them being either content or construct validity which have been previously established validations that support validity of ST as training tools to effectively develop LTS in surgical education (15).
Future work should aim to standardize scales for validation which incorporates both subjective and objective validation assessments such as the GOALS and OSATS scores for laparoscopic training. This will allow researchers to collect adequate data to validate multiple models and, in the future, compare results between published ST's.