The benefit of simulation lies in its ability to offer a controlled and standardized training environment without using patients for training purposes (8). Simulation also provides a method of training where objective outcomes are potentially measurable, which may be able to ensure a level of competency (31). Simulated training may help to address the ethical concerns of new trainee surgeons operating on patients (19). It also provides additional training opportunities for trainees in procedures which are needed out of normal working hours, or are not performed on a frequent basis e.g. lateral canthotomy and cantholysis. Simulation allows detailed real-time feedback from a trainer in an environment where patient compliance or comfort is not a factor (19). Although such arguments for the usefulness of simulation can be made, evidence is needed too.
This review demonstrated that simulated surgical procedures within GOO sub-specialities resulted in an increase in confidence and comfort levels of trainees. Although the results on technical skills of participants was limited, evidence suggests simulated surgical training can result in decrease in operative time and an increase in objective skills assessment scores. None of the included studies measured improvements in patient care or downstream effects such as cost savings, instead showing self-reported outcomes or improvement on the simulator tools only.
Animal models were the most extensively reported, followed by cadaveric models and artificial eyes. Cadaveric models provide access to the human anatomy with no implications for patient safety but are often costly and may have limited availability. Animal models can closely resemble the human anatomy and allow for direct tissue handling experience in high fidelity environment (18). Porcine anatomy is similar to humans, (32–34) though the upper lid tarsal plate is shorter than humans’ (34) and there is no lateral orbital rim (33). Mak reported that goat eyelid skin is thin and there is a better subcutaneous plane and a more clearly defined orbicularis oculi than in pigs, and that goat orbit is also similar to that of a human with a thick infraorbital rim and a thin orbital floor (18). Furthermore, goat is available worldwide and costs relatively little. Mak reported that procedures such as entropion correction, blepharoplasty and ptosis correction can be reproducibly performed on the goat eye model, though it’s not possible to perform lacrimal surgical procedures or endoscopic orbital decompression (18).
Four studies reported outcomes solely related to simulation on mannequins or artificial eye based models (26), (27), (28), (29). High fidelity artificial eyes are available for specific procedures such as trabeculectomy surgery and laser procedures (27), (28). The artificial eye and mannequin based models for simulation provide a reproducible and consistent model for surgical training without any ethical or legal implications. Zhao reported that the mannequin model was perceived by trainees to be equivalent to emergency room experience and cadaveric heads but not as good as operating room experience (26). Trainees also reported that the mannequin simulator may be more beneficial to trainees early in their career to gain experience with the surgical instruments and techniques, before moving on to performing the procedures on patients. When compared to cadaveric models which have been reported to cost from $200 to $5000 per pair of eyelids (35, 36), the reusable mannequin based model reported cost $425, with replacement eyelid cartridges costing $12.50 per cartridge. Mannequin based models are readily available and re-usable (26), but do not replace the experience of operating on human eyelids e.g. there is no bleeding and the tissue planes do not dissect in the same way as human eyelids.
Validity is the degree to which an instrument measures what it sets out to measure (37). It can be measured various ways: in this review the MERSQI definition and score were used, as part of evaluation of overall study quality. Only one study, Dean (27), scored maximal MERSQI marks of three out of three for validity of evaluation instrument, while Almaliotis (22) scored two and Dang (20) scored one, but all others scored zero. Similar findings have also been reported by Cook (38) who have shown that there is a lack of instrument validity in studies of the medical education literature.
The MERSQI score was used to assess methodological quality. The scores of the included studies ranged from 6.5–17 out 18 (mean 9). Lin suggests that a perfect score on the MERSQI is difficult (39), requiring a) a randomised controlled trial in more than 2 institutions, b) response rate of > 75%, c) an objective data assessment, d) reporting of the internal structure, content validity and criterion validity of the evaluation instrument, e) data analysis being appropriate, beyond descriptive analysis and f) outcomes measured related to patient/health care outcomes. In their review of 100 articles submitted to the Journal of General Internal Medicine (JGIM) special issue on medical education, the mean MERSQI score was 9.6 (range 5-15.5) with most of the manuscripts being single institution studies and only 36% reporting validity evidence for the evaluation instrument. They found that of the manuscripts which were accepted, the mean MERSQI score was significantly higher compared to those which were rejected. Cook suggests that the focus should be on the item specific scores in the MERSQI (40), which indicate the areas where the methodological quality could be improved. The MERSQI score provides a benchmark for high quality studies and it is important when designing studies to consider.
The majority of the papers in this scoping review reported high levels of learner acceptability of the simulation models. Self-reported outcomes were given in 13 studies and in all studies participants reported that simulation improved their surgical skills and confidence, maintained in one study for three months (19), and in another for one year (27).
The highest level of outcome was 2 (change in behaviours) with none of the studies reporting patient/healthcare outcomes (maximum 3) or other downstream effects such as cost savings. Most of the studies were designed to measure satisfaction of the participants. While Dean showed that the intervention group which received simulation training was 20 times more likely to perform trabeculectomy surgery compared to the control group, patient outcomes are not known (27). In theory, randomised control trials, with an intervention arm receiving simulation training and a control arm not receiving simulation training, can be used to measure clinical and patient-related effects. However, these outcomes (for example complication rates of surgeries) are difficult to associate with a specific simulation intervention perhaps done a long time before, and with lots of other factors involved since, potentially including other educational interventions. Furthermore, withholding simulation training from certain groups raises ethical issues for both the trainees themselves and for the patients eventually affected by this.
The strength of this review is that it focused on areas which are newly evolving within ophthalmology training programmes: simulation in GOO. A robust search strategy and validated scoring system for methodological quality were used. However, heterogeneity of the outcomes reported in the studies meant that synthesis of outcomes was not attempted. Interest into the efficacy of simulation related to GOO surgical training may be increasing with nine of the fifteen papers published in 2021 (41). This review provides a useful overview of the literature on models of simulation being developed for GOO surgery training and highlights gaps. This will enable future researchers to focus their studies. It also makes recommendations on how the methodology of future studies can be strengthened.